[Catalyst] Output as XML
J. Shirley
jshirley at gmail.com
Fri May 16 03:36:52 BST 2008
On Thu, May 15, 2008 at 10:49 PM, Mitch Jackson <perimus at gmail.com> wrote:
> Russell,
>
> Thanks for the suggestion. I looked at that, however it basically
> does what I'm already doing. The bottleneck wasn't so much TT, but
> the creation of thousands of DBIC objects and sticking them into an
> array. The same would need to be done with C::V::Rest::XML, as it
> serializes the stash. I needed an approach that generated the XML
> while walking the query results, rather than caching them all into
> memory first.
>
> Here's what I ended up doing. It needs more work to support joins or
> complex queries, but the speed difference is insane. Here's benchmark
> results between pulling 100, 1000, 5000 and 15000 table rows using the
> old way and the following function. As you can see, sending a DBIC
> array of 15,000 rows to TT took 228 seconds to render :-( This xml()
> method took 1.65 seconds.
>
> $ perl xmlbench.pl
> Rate obj 100 xml 100
> obj 100 3.53/s -- -94%
> xml 100 62.5/s 1669% --
>
> s/iter obj 1000 xml 1000
> obj 1000 3.38 -- -97%
> xml 1000 0.112 2932% --
>
> s/iter obj 5000 xml 5000
> obj 5000 32.3 -- -98%
> xml 5000 0.549 5779% --
>
> s/iter obj 15000 xml 15000
> obj 15000 228 -- -99%
> xml 15000 1.65 13753%
>
>
> ## include in the schema class
>
> # Use in place of ->search to return an XML document containing the
> # records for the query
> #
> # my $xml = $schema->xml('table',{field => 'value'},{rows => 20});
> sub xml {
> my ( $self, $model, @search_params ) = @_;
>
> croak 'xml( $model, @params ) requires a model parameter'
> unless defined $model and $model;
>
> my %xml_escape_map = (
> '<' => '<',
> '>' => '>',
> '"' => '"',
> '&' => '&',
> );
>
> # Prepare the query
> my $rs = $self->resultset($model)->search(@search_params);
> croak "xml() unable to prepare query" unless defined $rs;
>
> # Begin the XML document
> my $xml = '<?xml versiono="1.0" encoding="utf-8" ?>'."\n"
> . "<total_records>$rs</total_records>\n"
> . '<records>'."\n";
>
> # Add an xml block for each record in the set
> my @cols = $self->resultset($model)->result_source->columns;
> my $cursor = $rs->cursor;
> while ( my @rec = $cursor->next ) {
> $xml .= '<record>'."\n";
> for my $f ( @cols ) {
> my $v = shift @rec;
> $v =~ s/([\<\>\"\&])/$xml_escape_map{$1}/g;
> $xml .= " <${f}>$v</${f}>\n";
> }
> $xml .= '</record>'."\n";
> }
>
> # Terminate the xml
> $xml .= '</records>'."\n";
> return $xml;
> }
>
> /Mitchell K. Jackson
>
> On Wed, May 14, 2008 at 10:10 PM, Russell Jurney <rjurney at lucision.com> wrote:
>> Have you thought about using
>> this: http://search.cpan.org/~sri/Catalyst-View-REST-XML-0.01/XML.pm with
>> raw data to achieve the desired speed? Not sure where your bottleneck is,
>> but if TT is a problem then I assume XML::Simple is faster than TT to
>> serialize XML?
>> Russell Jurney
>> rjurney at lucision.com
>>
>>
>> On May 14, 2008, at 10:02 AM, Mitch Jackson wrote:
>>
>> Good morning!
>> I'm about to start working on some DBIC query to XML code, but before
>> I do I was wondering if anybody out there has already done this, or if
>> perhaps my approach is thick-headed.
>> I'm generating XML from database queries in a catalyst app. At the
>> moment, I am doing it a bit like this (simplified for readability):
>> ---------------------------------------------------------------------
>> # controller.pm /controller/action/parm1/parm2/parm3/something.xml
>> sub action : Local {
>> ...
>> $c->stash->{records} = [ $c->model('table')->search( {}, { rows =>
>> 20, page 2 } ) ];
>> $c->res->content_type('text/xml');
>> $c-.res->header('Content-disposition' => 'attachment;
>> filename=action_${timestamp}.xml');
>> $c->res->template('xml/action.xml');
>> }
>> # xml/action.xml
>> <?xml version="1.0" encoding="utf-8" ?>
>> <records>
>> [% FOREACH record IN records -%]
>> <record id="[% record.id %]">
>> <field1>[% record.field1 %]</field1>
>> <field2>[% record.field2 %]</field2>
>> <field3>[% record.field3 %]</field3>
>> </record>
>> [% END # foreach record -%]
>> </records>
>> -------------------------------------------------------------------------------
>> This approach works fine for paged record sets ( that get loaded into
>> an ExtJS ajax grid ). When I use this on a record set of 15k-16k
>> records, the app goes to 100% CPU and cannot complete the request
>> after several minutes. There is a lot of overhead to generate 16k
>> DBIC objects, dump them in an array, and then manipulate them through
>> TT.
>> This speed problem is unacceptable for my app, especially considering
>> my users may be dealing with much larger datasets than this.
>> One solution would be to write something proprietary to this
>> implementation as a module that would throw away the overhead bloat
>> and generate the XML file efficiently... but I want something reusable
>> in the long term from a catalyst perespective.
>> I am considering writing some sort of DBIC query to XML code that
>> would use the DBI cursor directly to bypass object creation and build
>> the XML while looping through the results.
>> (http://search.cpan.org/~ash/DBIx-Class-0.08010/lib/DBIx/Class/Manual/Cookbook.pod#Get_raw_data_for_blindingly_fast_results)
>> An interface like this:
>> my $xml = $c->model('table')->xml( \%filter_parms, \%dbic_opts )
>> That would generate output like this
>> <?xml version="1.0" encoding="utf-8" ?>
>> <records>
>> <record id="42">
>> <field1>don't panic</field1>
>> <field2>vogon poetry</field2>
>> <field3>see if i don't</field3>
>> </record>
>> ...
>> </records>
>> The questions I pose are this:
>> - Is there something already out there that does what I need?
>> - Is there a big problem with my approach?
>> - Would anybody else be interested in this if I get it working?
>>
>> Kind Regards,
>> /Mitchell K. Jackson
You probably would do better asking on the DBIx::Class mailing list.
The other thing is that you don't need to inflate into DBIC objects,
as the DBIC Cookbook states:
http://search.cpan.org/~ash/DBIx-Class-0.08010/lib/DBIx/Class/Manual/Cookbook.pod#Skip_object_creation_for_faster_results
That will handle the serialization out to just a hash.
-J
More information about the Catalyst
mailing list