[Catalyst] Output as XML

J. Shirley jshirley at gmail.com
Fri May 16 03:36:52 BST 2008


On Thu, May 15, 2008 at 10:49 PM, Mitch Jackson <perimus at gmail.com> wrote:
> Russell,
>
> Thanks for the suggestion.  I looked at that, however it basically
> does what I'm already doing.  The bottleneck wasn't so much TT, but
> the creation of thousands of DBIC objects and sticking them into an
> array.  The same would need to be done with C::V::Rest::XML, as it
> serializes the stash.  I needed an approach that generated the XML
> while walking the query results, rather than caching them all into
> memory first.
>
> Here's what I ended up doing.  It needs more work to support joins or
> complex queries, but the speed difference is insane.  Here's benchmark
> results between pulling 100, 1000, 5000 and 15000 table rows using the
> old way and the following function.  As you can see, sending a DBIC
> array of 15,000 rows to TT took 228 seconds to render :-(  This xml()
> method took 1.65 seconds.
>
> $ perl xmlbench.pl
>          Rate obj 100 xml 100
> obj 100 3.53/s      --    -94%
> xml 100 62.5/s   1669%      --
>
>         s/iter obj 1000 xml 1000
> obj 1000   3.38       --     -97%
> xml 1000  0.112    2932%       --
>
>         s/iter obj 5000 xml 5000
> obj 5000   32.3       --     -98%
> xml 5000  0.549    5779%       --
>
>          s/iter obj 15000 xml 15000
> obj 15000    228        --      -99%
> xml 15000   1.65    13753%
>
>
> ## include in the schema class
>
> # Use in place of ->search to return an XML document containing the
> # records for the query
> #
> # my $xml = $schema->xml('table',{field => 'value'},{rows => 20});
> sub xml {
>    my ( $self, $model, @search_params ) = @_;
>
>    croak 'xml( $model, @params ) requires a model parameter'
>        unless defined $model and $model;
>
>    my %xml_escape_map = (
>        '<' => '&lt;',
>        '>' => '&gt;',
>        '"' => '&quot;',
>        '&' => '&amp;',
>    );
>
>    # Prepare the query
>    my $rs  = $self->resultset($model)->search(@search_params);
>    croak "xml() unable to prepare query" unless defined $rs;
>
>    # Begin the XML document
>    my $xml = '<?xml versiono="1.0" encoding="utf-8" ?>'."\n"
>            . "<total_records>$rs</total_records>\n"
>            . '<records>'."\n";
>
>    # Add an xml block for each record in the set
>    my @cols = $self->resultset($model)->result_source->columns;
>    my $cursor = $rs->cursor;
>    while ( my @rec = $cursor->next ) {
>        $xml .= '<record>'."\n";
>        for my $f ( @cols ) {
>            my $v = shift @rec;
>            $v =~ s/([\<\>\"\&])/$xml_escape_map{$1}/g;
>            $xml .= "    <${f}>$v</${f}>\n";
>        }
>        $xml .= '</record>'."\n";
>    }
>
>    # Terminate the xml
>    $xml .= '</records>'."\n";
>    return $xml;
> }
>
> /Mitchell K. Jackson
>
> On Wed, May 14, 2008 at 10:10 PM, Russell Jurney <rjurney at lucision.com> wrote:
>> Have you thought about using
>> this: http://search.cpan.org/~sri/Catalyst-View-REST-XML-0.01/XML.pm with
>> raw data to achieve the desired speed?  Not sure where your bottleneck is,
>> but if TT is a problem then I assume XML::Simple is faster than TT to
>> serialize XML?
>> Russell Jurney
>> rjurney at lucision.com
>>
>>
>> On May 14, 2008, at 10:02 AM, Mitch Jackson wrote:
>>
>> Good morning!
>> I'm about to start working on some DBIC query to XML code, but before
>> I do I was wondering if anybody out there has already done this, or if
>> perhaps my approach is thick-headed.
>> I'm generating XML from database queries in a catalyst app.  At the
>> moment, I am doing it a bit like this (simplified for readability):
>> ---------------------------------------------------------------------
>> # controller.pm  /controller/action/parm1/parm2/parm3/something.xml
>> sub action : Local {
>>   ...
>>   $c->stash->{records} = [ $c->model('table')->search( {}, { rows =>
>> 20, page 2 } ) ];
>>   $c->res->content_type('text/xml');
>>   $c-.res->header('Content-disposition' => 'attachment;
>> filename=action_${timestamp}.xml');
>>   $c->res->template('xml/action.xml');
>> }
>> # xml/action.xml
>> <?xml version="1.0" encoding="utf-8" ?>
>> <records>
>> [% FOREACH record IN records -%]
>> <record id="[% record.id %]">
>>   <field1>[% record.field1 %]</field1>
>>   <field2>[% record.field2 %]</field2>
>>   <field3>[% record.field3 %]</field3>
>> </record>
>> [% END # foreach record -%]
>> </records>
>> -------------------------------------------------------------------------------
>> This approach works fine for paged record sets ( that get loaded into
>> an ExtJS ajax grid ).  When I use this on a record set of 15k-16k
>> records, the app goes to 100% CPU and cannot complete the request
>> after several minutes.  There is a lot of overhead to generate 16k
>> DBIC objects, dump them in an array, and then manipulate them through
>> TT.
>> This speed problem is unacceptable for my app, especially considering
>> my users may be dealing with much larger datasets than this.
>> One solution would be to write something proprietary to this
>> implementation as a module that would throw away the overhead bloat
>> and generate the XML file efficiently... but I want something reusable
>> in the long term from a catalyst perespective.
>> I am considering writing some sort of DBIC query to XML code that
>> would use the DBI cursor directly to bypass object creation and build
>> the XML while looping through the results.
>> (http://search.cpan.org/~ash/DBIx-Class-0.08010/lib/DBIx/Class/Manual/Cookbook.pod#Get_raw_data_for_blindingly_fast_results)
>> An interface like this:
>> my $xml = $c->model('table')->xml( \%filter_parms, \%dbic_opts )
>> That would generate output like this
>> <?xml version="1.0" encoding="utf-8" ?>
>> <records>
>> <record id="42">
>>   <field1>don't panic</field1>
>>   <field2>vogon poetry</field2>
>>   <field3>see if i don't</field3>
>> </record>
>> ...
>> </records>
>> The questions I pose are this:
>> - Is there something already out there that does what I need?
>> - Is there a big problem with my approach?
>> - Would anybody else be interested in this if I get it working?
>>
>> Kind Regards,
>> /Mitchell K. Jackson


You probably would do better asking on the DBIx::Class mailing list.
The other thing is that you don't need to inflate into DBIC objects,
as the DBIC Cookbook states:

http://search.cpan.org/~ash/DBIx-Class-0.08010/lib/DBIx/Class/Manual/Cookbook.pod#Skip_object_creation_for_faster_results

That will handle the serialization out to just a hash.

-J



More information about the Catalyst mailing list