[Catalyst-commits] r13738 - trunk/examples/CatalystAdvent/root/2010

Fri Dec 3 04:40:19 GMT 2010

Author: dhoss
Date: 2010-12-03 04:40:19 +0000 (Fri, 03 Dec 2010)
New Revision: 13738

Added:
   trunk/examples/CatalystAdvent/root/2010/2.pod
Log:
bombs away

Added: trunk/examples/CatalystAdvent/root/2010/2.pod
===================================================================

--- trunk/examples/CatalystAdvent/root/2010/2.pod	                        (rev 0)
+++ trunk/examples/CatalystAdvent/root/2010/2.pod	2010-12-03 04:40:19 UTC (rev 13738)
@@ -0,0 +1,270 @@
+=head1 Creating an Easy to Manage Search Engine with Catalyst and ElasticSearch
+
+=head1 Overview
+
+L<http://www.elasticsearch.com|ElasticSearch> is a search engine based on Lucene that has a number of really cool features that in my opinion, elevate it above a number of L<http://lucene.apache.org/solr/|other> L<http://sphinxsearch.com/|search> L<http://www.rectangular.com/kinosearch/|engines>.
+
+For instance, it's schema-less, which some would argue is a bad thing, but the way things are indexed (indexed "things" care called documents) in ElasticSearch allows the user to create a sort of per-document schema much like you would with MongoDB or other document-based storage engines.  It also has an "autodiscovery" features for other ElasticSearch instances on the network.  All you have to do is C<bin/elasticsearch> on the machines you want to cluster and poof, you have a distributed and fault tolerant index.
+
+So! moving forward, let's get into some code and set up.
+
+=head1 Getting ElasticSearch
+
+=over 12
+
+=item Step 1
+
+Download your desired version and build of ElasticSearch here: L<http://www.elasticsearch.com/download/> (you can also L<http://www.elasticsearch.com/download/master/|build from source>)
+
+=item Step 2
+
+Decompress (or build) ElasticSearch into your desired location.  It's really not important where you do this, but /opt/elasticsearch is where I put mine.
+
+=item Step 3
+
+Start your instances by typing C<bin/elasticsearch> in the root directory where you decompressed ElasticSearch.  You can also run with the C<-f> switch to have it run in the foreground and spit out debug information.
+
+=back
+
+=head1 A Simple API Introduction
+
+L<http://search.cpan.org/~drtech/ElasticSearch-0.27/lib/ElasticSearch.pm|ElasticSearch> is the Perl binding to the ElasticSearch REST API, and is written (marvelously) by Clinton Gormley.  It has a few key methods we will be using in this article.
+
+=over 12
+
+=item C<new>
+
+Creates your connection to your ElasticSearch instance(s).  
+
+=item C<index>
+
+Indexes your data.  Takes an index name, a document id (unique, autogenerated if you leave it out), and your data which should be in the form of a hashref.
+
+=item C<search>
+
+Search your indexed data.  Takes an index name, a query type (you can also type your documents when you index them, for instance, a document that is an email, or a tweet), and your query string.  There are a number of search options you can use to query your data, but the one we'll use here is the C<field> query.
+
+=back
+
+Okay.  So that's a basic ElasticSearch API.  There are plenty of L<http://www.elasticsearch.com/docs/elasticsearch/rest_api/|examples> on the site you can check out if you feel you need to grok this more thoroughly.  Next, we figure out how to tie this thing to Catalyst.
+
+=head1 Catalyst::Model
+
+We will be using creating a small model to hook up our ElasticSearch model to our Catalyst application.  
+
+Code:
+
+Search.pm:
+
+    package Search;
+    
+    use Moose;
+    use namespace::autoclean;
+    use ElasticSearch;
+    
+    has 'es_object' => (
+        is       => 'ro',
+        isa      => 'ElasticSearch',
+        required => 1,
+        lazy     => 1,
+        default  =>  sub {
+            ElasticSearch->new(
+                servers     => 'localhost:9200',
+                transport   => 'httplite',
+                trace_calls => 'log_file',
+            );
+        },
+
+    );
+
+    sub index_data {
+        my ($self,  %p) = @_;
+        $self->es_object->index(
+        index => $p{'index'},
+            type  => $p{'type'},
+            data  => $p{'data'},
+        );
+    }
+
+    sub execute_search {
+        my ($self, %p) = @_;
+        my $results =  $self->es_object->search(
+            index => $p{'index'},
+            type  => $p{'type'},
+            query => {
+                field => {
+                    _all => $p{'terms'},
+                },
+            }
+        );
+        $results;
+    }
+
+
+
+    1;
+
+
+
+MyApp::Model::Search:
+
+    package MyApp::Model::Search;
+
+    use Moose;
+    use namespace::autoclean;
+
+    sub COMPONENT {
+        my ($class, $c, $config) = @_;
+        my $self = $class->new(%{ $config });
+
+        return $self;
+    }    
+
+    __PACKAGE__->meta->make_immutable;
+
+
+Okay.  So we have the search portion set up. This will be called like C<my $results = $c->model('Search')-E<gt>results(%opts)> from inside our application.
+
+The next step is to set up an indexer.  My example uses DBIx::Class as the source of data to index, as that's what I originally wrote all this for.  However, you can use an arbitrary data source as long as you can break it up into the bits that ElasticSearch needs.
+
+The script:
+
+    use Search;
+    use My::Schema;
+    
+    my $schema = My::Schema->connect("dbi:Pg:dbname=mydb", "user", "pass");
+    my $search = Search->new;
+    my $rs = $schema->resultset('Entry')->search({ published => 1 });
+    print "Search obj: " . Dumper $search_obj;
+    print "Beginning indexing\n";
+    
+    while ( my $entry = $rs->next ) {
+       print "Indexing " . $entry->title . "\n";
+        my $result = $search_obj->index_data(
+            index => 'deimos',
+            type => $entry->type,
+            data => {
+                title       => $entry->title,
+                display_title => $entry->display_title,
+                author      => $entry->author->name,
+                created     => $entry->created_at ."",
+                updated     => $entry->updated_at ."",
+                body        => $entry->body,
+                attachments => \@attachments,
+            },
+        );
+
+    }
+
+That is a basic script to get our data indexed.  To confirm, we can run a few cURL searches: 
+
+    curl -XGET 'http://127.0.0.1:9200/_all/_search'  -d '
+    {
+       "query" : {
+          "field" : {
+             "_all" : "your search terms that you know will get you a document returned"
+          }
+       }
+    }
+ 
+This will return something like: 
+
+    {
+       "query" : {
+          "field" : {
+             "_all" : "test"
+          }
+       }
+    }'
+    {"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":4,"max_score":0.24368995,"hits":[{"_index":"ourindexdeimos","_type":"post","_id":"l_3Jw9PkRz2arFdHO3t5Pg","_score":0.24368995, "_source" : {
+    "thingy":"thingy data"
+    }
+
+If you get something looking like that, congrats! Your data index properly.
+
+=head1 Executing searches within your application
+
+So here we go, what we all came here for.
+
+Here is the Search controller:
+
+    package MyApp::Controller::Search;
+    use Moose;
+    use namespace::autoclean;
+    BEGIN { extends 'Catalyst::Controller::REST'; }
+
+
+    sub base : Chained('/') PathPart('') CaptureArgs(0) {
+        my ($self, $c) = @_;
+        my $data = $c->req->data || $c->req->params;
+        my $results = $c->model('Search')->results( 
+            terms => $data->{'q'}, 
+            index => $data->{'index'} || "default", 
+            type => $data->{'type'} || "post" 
+        );
+        my @results;
+        for my $result ( @{$results->{'hits'}{'hits'}} ) {
+            my $r = $result->{'_source'};
+            my $body = substr($r->{'body'}, 0, 300);
+            $body .= "...";
+            push @results, {
+                display_title => $r->{'display_title'},
+                title   => $r->{'title'},
+                created => $r->{'created'},
+                updated => $r->{'updated'},
+                author  => $r->{'author'},
+                body    => $body,
+            };
+
+        }
+       $c->stash( results => \@results ); 
+
+    }
+
+
+    sub index :Chained('base') PathPart('search') Args(0) ActionClass('REST'){
+        my ($self, $c) = @_;
+    
+    }
+
+    sub index_GET {
+        my ($self, $c) = @_;
+        $self->status_ok($c, 
+            entity => {
+                results => $c->stash->{'results'} ,
+            },
+        );
+    }
+
+
+
+    __PACKAGE__->meta->make_immutable;
+    1;
+
+And a simple template to display them: 
+
+    <h2>Search results for <strong>"[% c.req.param('q') %]</strong>":</h2>
+    <ul>
+    [% FOR result IN results %]
+    <li>
+    <div>By [% result.author %]</div>
+    <div><a href="[% c.uri_for_action('/url/to/your/document', [ result.title ]) %]">[% result.display_title %]</a></div>
+    <div>[% result.body %]</div>
+    </li>
+    [% END %]
+    </ul>
+
+And there you go.  A very simple, flexible, and relatively fast search engine, with the ability to use any data storage back end for your indexable data.
+
+=head1 Parting notes
+
+ElasticSearch is extremely customizable and tuneable.  You can get a GREAT deal of performance improvement by playing with the indexing options, ranking algorithms, storage and request transports.  All of this is documented again at the L<http://www.elasticsearch.com|ElasticSearch> web site. 
+
+One final thought, you can add the portion of the indexer code that actually inserts the document into the search index right after your "commit" portion of your data store for your application.  This way, you get virtually instantaneous indexing of your document upon its creation.
+
+Enjoy folks, I hope you find this as useful as I did!
+
+--Devin "dhoss" Austin, 2010.
+
+Created using Catalyst 5.80029 on a Mac Book Pro Perl version 5 revision 12 subversion 0
+=cut