[Dbix-class] Maybe OT - How to create a result set based on
'similarity'?
Jason Kohles
email at jasonkohles.com
Fri Mar 2 17:09:08 GMT 2007
On Mar 2, 2007, at 10:42 AM, Mario Minati wrote:
> I would like to find out if there are already companies in my
> addressbook (DB) which are similar to a given name to avoid double
> entries.
>
> How to measure similarity:
> I'am thinking of the hammingdistance. That means the difference
> between Linux and Linus is 1 as there is one letter different. The
> distance between Linux and Lisa is 3 as there is one letter more
> and two are different.
>
> Does anyone have an idea how to realize that?
> Can one realize this with code running on the database (PL/SQL or
> something) or is there a way doing that with DBIx::Class (drawback:
> all data had to read before processing).
>
In addition to my other response, you might find this interesting,
although I would be a bit afraid of how long this might take in a
large database...
From http://www.merriampark.com/ldplsql.htm
Levenshtein Distance Algorithm: Oracle PL/SQL Implementation
by Barbara Boehmer (baboehme at hotmail.com)
CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
(p_source_string IN VARCHAR2,
p_target_string IN VARCHAR2)
RETURN NUMBER
DETERMINISTIC
AS
v_length_of_source NUMBER := NVL (LENGTH (p_source_string), 0);
v_length_of_target NUMBER := NVL (LENGTH (p_target_string), 0);
TYPE mytabtype IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
column_to_left mytabtype;
current_column mytabtype;
v_cost NUMBER := 0;
BEGIN
IF v_length_of_source = 0 THEN
RETURN v_length_of_target;
ELSIF v_length_of_target = 0 THEN
RETURN v_length_of_source;
ELSE
FOR j IN 0 .. v_length_of_target LOOP
column_to_left(j) := j;
END LOOP;
FOR i IN 1.. v_length_of_source LOOP
current_column(0) := i;
FOR j IN 1 .. v_length_of_target LOOP
IF SUBSTR (p_source_string, i, 1) =
SUBSTR (p_target_string, j, 1)
THEN v_cost := 0;
ELSE v_cost := 1;
END IF;
current_column(j) := LEAST (current_column(j-1) + 1,
column_to_left(j) + 1,
column_to_left(j-1) + v_cost);
END LOOP;
FOR j IN 0 .. v_length_of_target LOOP
column_to_left(j) := current_column(j);
END LOOP;
END LOOP;
END IF;
RETURN current_column(v_length_of_target);
END ld;
--
Jason Kohles
email at jasonkohles.com
http://www.jasonkohles.com/
"A witty saying proves nothing." -- Voltaire
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/dbix-class/attachments/20070302/d2f95a12/attachment-0001.htm
More information about the Dbix-class
mailing list