[Catalyst] Output as XML

Fri May 16 04:36:59 BST 2008

J,

My solution was better suited for the DBIx::Class list I suppose, but
I posed the question here to see if there was already some sort of
Catalyst solution I had overlooked... a view for example.

I took a look at that part of the cookbook before, but it seems to
only apply if you're pulling one table row, not a record set.

Thanks for the advice...

/Mitch

On Thu, May 15, 2008 at 9:36 PM, J. Shirley <jshirley at gmail.com> wrote:
> On Thu, May 15, 2008 at 10:49 PM, Mitch Jackson <perimus at gmail.com> wrote:
>> Russell,
>>
>> Thanks for the suggestion.  I looked at that, however it basically
>> does what I'm already doing.  The bottleneck wasn't so much TT, but
>> the creation of thousands of DBIC objects and sticking them into an
>> array.  The same would need to be done with C::V::Rest::XML, as it
>> serializes the stash.  I needed an approach that generated the XML
>> while walking the query results, rather than caching them all into
>> memory first.
>>
>> Here's what I ended up doing.  It needs more work to support joins or
>> complex queries, but the speed difference is insane.  Here's benchmark
>> results between pulling 100, 1000, 5000 and 15000 table rows using the
>> old way and the following function.  As you can see, sending a DBIC
>> array of 15,000 rows to TT took 228 seconds to render :-(  This xml()
>> method took 1.65 seconds.
>>
>> $ perl xmlbench.pl
>>          Rate obj 100 xml 100
>> obj 100 3.53/s      --    -94%
>> xml 100 62.5/s   1669%      --
>>
>>         s/iter obj 1000 xml 1000
>> obj 1000   3.38       --     -97%
>> xml 1000  0.112    2932%       --
>>
>>         s/iter obj 5000 xml 5000
>> obj 5000   32.3       --     -98%
>> xml 5000  0.549    5779%       --
>>
>>          s/iter obj 15000 xml 15000
>> obj 15000    228        --      -99%
>> xml 15000   1.65    13753%
>>
>>
>> ## include in the schema class
>>
>> # Use in place of ->search to return an XML document containing the
>> # records for the query
>> #
>> # my $xml = $schema->xml('table',{field => 'value'},{rows => 20});
>> sub xml {
>>    my ( $self, $model, @search_params ) = @_;
>>
>>    croak 'xml( $model, @params ) requires a model parameter'
>>        unless defined $model and $model;
>>
>>    my %xml_escape_map = (
>>        '<' => '&lt;',
>>        '>' => '&gt;',
>>        '"' => '&quot;',
>>        '&' => '&amp;',
>>    );
>>
>>    # Prepare the query
>>    my $rs  = $self->resultset($model)->search(@search_params);
>>    croak "xml() unable to prepare query" unless defined $rs;
>>
>>    # Begin the XML document
>>    my $xml = '<?xml versiono="1.0" encoding="utf-8" ?>'."\n"
>>            . "<total_records>$rs</total_records>\n"
>>            . '<records>'."\n";
>>
>>    # Add an xml block for each record in the set
>>    my @cols = $self->resultset($model)->result_source->columns;
>>    my $cursor = $rs->cursor;
>>    while ( my @rec = $cursor->next ) {
>>        $xml .= '<record>'."\n";
>>        for my $f ( @cols ) {
>>            my $v = shift @rec;
>>            $v =~ s/([\<\>\"\&])/$xml_escape_map{$1}/g;
>>            $xml .= "    <${f}>$v</${f}>\n";
>>        }
>>        $xml .= '</record>'."\n";
>>    }
>>
>>    # Terminate the xml
>>    $xml .= '</records>'."\n";
>>    return $xml;
>> }
>>
>> /Mitchell K. Jackson
>>
>> On Wed, May 14, 2008 at 10:10 PM, Russell Jurney <rjurney at lucision.com> wrote:
>>> Have you thought about using
>>> this: http://search.cpan.org/~sri/Catalyst-View-REST-XML-0.01/XML.pm with
>>> raw data to achieve the desired speed?  Not sure where your bottleneck is,
>>> but if TT is a problem then I assume XML::Simple is faster than TT to
>>> serialize XML?
>>> Russell Jurney
>>> rjurney at lucision.com
>>>
>>>
>>> On May 14, 2008, at 10:02 AM, Mitch Jackson wrote:
>>>
>>> Good morning!
>>> I'm about to start working on some DBIC query to XML code, but before
>>> I do I was wondering if anybody out there has already done this, or if
>>> perhaps my approach is thick-headed.
>>> I'm generating XML from database queries in a catalyst app.  At the
>>> moment, I am doing it a bit like this (simplified for readability):
>>> ---------------------------------------------------------------------
>>> # controller.pm  /controller/action/parm1/parm2/parm3/something.xml
>>> sub action : Local {
>>>   ...
>>>   $c->stash->{records} = [ $c->model('table')->search( {}, { rows =>
>>> 20, page 2 } ) ];
>>>   $c->res->content_type('text/xml');
>>>   $c-.res->header('Content-disposition' => 'attachment;
>>> filename=action_${timestamp}.xml');
>>>   $c->res->template('xml/action.xml');
>>> }
>>> # xml/action.xml
>>> <?xml version="1.0" encoding="utf-8" ?>
>>> <records>
>>> [% FOREACH record IN records -%]
>>> <record id="[% record.id %]">
>>>   <field1>[% record.field1 %]</field1>
>>>   <field2>[% record.field2 %]</field2>
>>>   <field3>[% record.field3 %]</field3>
>>> </record>
>>> [% END # foreach record -%]
>>> </records>
>>> -------------------------------------------------------------------------------
>>> This approach works fine for paged record sets ( that get loaded into
>>> an ExtJS ajax grid ).  When I use this on a record set of 15k-16k
>>> records, the app goes to 100% CPU and cannot complete the request
>>> after several minutes.  There is a lot of overhead to generate 16k
>>> DBIC objects, dump them in an array, and then manipulate them through
>>> TT.
>>> This speed problem is unacceptable for my app, especially considering
>>> my users may be dealing with much larger datasets than this.
>>> One solution would be to write something proprietary to this
>>> implementation as a module that would throw away the overhead bloat
>>> and generate the XML file efficiently... but I want something reusable
>>> in the long term from a catalyst perespective.
>>> I am considering writing some sort of DBIC query to XML code that
>>> would use the DBI cursor directly to bypass object creation and build
>>> the XML while looping through the results.
>>> (http://search.cpan.org/~ash/DBIx-Class-0.08010/lib/DBIx/Class/Manual/Cookbook.pod#Get_raw_data_for_blindingly_fast_results)
>>> An interface like this:
>>> my $xml = $c->model('table')->xml( \%filter_parms, \%dbic_opts )
>>> That would generate output like this
>>> <?xml version="1.0" encoding="utf-8" ?>
>>> <records>
>>> <record id="42">
>>>   <field1>don't panic</field1>
>>>   <field2>vogon poetry</field2>
>>>   <field3>see if i don't</field3>
>>> </record>
>>> ...
>>> </records>
>>> The questions I pose are this:
>>> - Is there something already out there that does what I need?
>>> - Is there a big problem with my approach?
>>> - Would anybody else be interested in this if I get it working?
>>>
>>> Kind Regards,
>>> /Mitchell K. Jackson
>
>
> You probably would do better asking on the DBIx::Class mailing list.
> The other thing is that you don't need to inflate into DBIC objects,
> as the DBIC Cookbook states:
>
> http://search.cpan.org/~ash/DBIx-Class-0.08010/lib/DBIx/Class/Manual/Cookbook.pod#Skip_object_creation_for_faster_results
>
> That will handle the serialization out to just a hash.
>
> -J
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>