[Dbix-class] paging question

Ihnen, David dihnen at amazon.com
Wed Feb 3 18:40:12 GMT 2010


It takes four hours to run the query?  I have written queries affecting grouping and sorting 6 billion rows that ran several orders of magnitude faster than that.

I suspect that you need to consider appropriate indexes to support your where clauses?

If you truly working with data of that billion-record magnitude, your heap is too big to query this way.  You'll need to take an alternative approach such as denormalizing the data store into derived data closer to the final form that will be needed, then maintain the denormalized data to be in sync.  (That denormalization could take the form of creating a smaller summary table, which could be queried with DBIx::Class - I did that for statistics reporting before.  Though I could query the stats directly, it was very expensive.)

David

-----Original Message-----
From: James Marca [mailto:jmarca at translab.its.uci.edu] 
Sent: Wednesday, February 03, 2010 10:16 AM
To: DBIx::Class user and developer list
Subject: Re: [Dbix-class] paging question

On Tue, Feb 02, 2010 at 11:14:27AM -0800, James Marca wrote:
> Hi.  
> 
> First DBIx::Class is excellent and kudos all around.
> 
...

> My query is rather large and expensive.  I'm using paging not to drive
> a UI, but because when running the request without paging psql
> complained that it ran out of disk space for the tmp file it needed
> for the cursor (I forget the exact wording of the error).  My page
> size is currently set at 4 million rows, which seems fine.
> 
> My guess is that under the hood paging is preparing the query, then
> executing that prepared statement repeatedly with different offsets.
> However, in practice it *appears* (I can't prove this yet so I could
> be wrong) that the page(2), page(3), etc queries take just as long to
> execute as the first page(1) query, as if they aren't running off of
> the same prepared statement.
> 
> And because the queries take so long, I really don't want to run that
> last page(n+1) query, wait an age, and then  get nothing.  

More details.  I stuck in a line to print the timestamp at the
beginning and end of the query.  The queries for page 1 and page 2
both took around 4 hours to complete.  While it could be that I'm just
out of luck here, I'd appreciate some advice from anybody who knows
the code well as to whether or not manually setting up a prepared
statement is worth it.

>From what I read in the code and from reading the docs, I see that
DBIx::Class caches statements, but I'm not sure whether or not that is
the same thing as reusing a prepared statement with different
parameters.

Regards,
James

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@lists.scsys.co.uk



More information about the DBIx-Class mailing list