<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; "><DIV><DIV>On Mar 2, 2007, at 10:42 AM, Mario Minati wrote:</DIV><BR class="Apple-interchange-newline"><BLOCKQUOTE type="cite"><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Hello @all,</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><BR></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I'm looking for a solution to find out if there is already some data in my dataset that is similar to a new entry.</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><BR></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Example:</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Companynames</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I would like to find out if there are already companies in my addressbook (DB) which are similar to a given name to avoid double entries.</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><BR></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">How to measure similarity:</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I'am thinking of the hammingdistance. That means the difference between Linux and Linus is 1 as there is one letter different. The distance between Linux and Lisa is 3 as there is one letter more and two are different.</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><BR></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Does anyone have an idea how to realize that?</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Can one realize this with code running on the database (PL/SQL or something) or is there a way doing that with DBIx::Class (drawback: all data had to read before processing).</DIV></BLOCKQUOTE><BR></DIV><DIV>Many databases have either built-in or extensions available to do this. The only ones that I have used are from the PostgreSQL contrib module fuzzystrmatch, which provides functions for calculating Levenshtein Distance and Soundex (possibly others, I can't recall).</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>If your database doesn't support it directly, for some solutions (Soundex, Metaphone) where you are calculating a checksum of the value and then looking for ones where the checksum is the same, you can precalculate the value when updating the database, and then just do a simple comparison search. Something like this would probably work...</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>package MyDB::Company;</DIV><DIV>use base 'DBIx::Class';</DIV><DIV>use Text::Metaphone qw();</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>__PACKAGE__->add_columns( qw( company company_metaphone ) );</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>sub calculate_company_metaphone {</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>my ( $self, $value ) = @_;</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>my $co = lc( $value );</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>$co =~ s/[^\w\s]//g;</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>$co =~ s/\b(incorporated|inc|corporation|corp)\b/INC/;</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>$co =~ s/\b(company|co)\b/CO/;</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN># probably some other useful standardization stuff here</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>return Text::Metaphone::Metaphone( $co );</DIV><DIV>}</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>sub store_column {</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>my ( $self, $name, $value ) = @_;</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>if ( $name eq 'company' ) {</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">                </SPAN>$self->company_metaphone( $self->calculate_comany_metaphone( $value ) );</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>}</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>$self->next::method( $name, $value );</DIV><DIV>}</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Then you can simply search something like...</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>my $rs = $schema->resultset( 'Company' )->search( {</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>company_metaphone<SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>=> MyDB::Company->calculate_company_metaphone( $company_name ),</DIV><DIV>} );</DIV><DIV> </DIV><DIV> <SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><DIV>-- </DIV><DIV>Jason Kohles</DIV><DIV><A href="mailto:email@jasonkohles.com">email@jasonkohles.com</A></DIV><DIV><A href="http://www.jasonkohles.com">http://www.jasonkohles.com</A>/</DIV><DIV>"A witty saying proves nothing." -- Voltaire</DIV><BR class="Apple-interchange-newline"></SPAN></SPAN> </DIV><BR></BODY></HTML>