[Dbix-class] Maybe OT - How to create a result set based on 'similarity'?

John Napiorkowski jjn1056 at yahoo.com
Fri Mar 2 17:08:30 GMT 2007



----- Original Message ----
From: Mario Minati <mario at minati.de>
To: dbix-class at lists.rawmode.org
Sent: Friday, March 2, 2007 10:42:29 AM
Subject: [Dbix-class] Maybe OT - How to create a result set based on 'similarity'?

Hello @all,

I'm looking for a solution to find out if there is already some data in 
my dataset that is similar to a new entry.

Example:
Companynames
I would like to find out if there are already companies in my 
addressbook (DB) which are similar to a given name to avoid double entries.

How to measure similarity:
I'am thinking of the hammingdistance. That means the difference between 
Linux and Linus is 1 as there is one letter different. The distance 
between Linux and Lisa is 3 as there is one letter more and two are 
different.

Does anyone have an idea how to realize that?
Can one realize this with code running on the database (PL/SQL or 
something) or is there a way doing that with DBIx::Class (drawback: all 
data had to read before processing).

Thank you for any hint.

Greets,
Mario Minati

Mario,

Seems more like something you'd want to do in a search engine.  Postgresql has done some work in this area, you might want to check their site.  I think using SQL to do this would be prohibitive.  I can imagine building a SQL statement that would return all rows in a table where a given column had a value that was one or two different in the way you mentioned, but anything bigger that that and you'd end up with quite a large SQL statement.  I'd try do do this using some build in capabilities of the Database if I could.  If the dataset was small than doing it in perl would be easy as well, but you are going to generate lots of database traffic.  If that's not an issue (this job is running on a scheduler during low activity time) you could cache the resultset out to disk to avoid filling all your memory.

good luck!
--john

_______________________________________________
List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
Wiki: http://dbix-class.shadowcatsystems.co.uk/
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
Searchable Archive: http://www.mail-archive.com/dbix-class@lists.rawmode.org/





 
____________________________________________________________________________________
TV dinner still cooling? 
Check out "Tonight's Picks" on Yahoo! TV.
http://tv.yahoo.com/



More information about the Dbix-class mailing list