[Dbix-class] paging question
jmarca at translab.its.uci.edu
Wed Feb 3 22:59:03 GMT 2010
On Wed, Feb 03, 2010 at 02:14:35PM -0800, Ihnen, David wrote:
> Based on what you've said (5 minute summaries) I would expect your records have a date field that you are grouping in your query.
> My strategy for handling summarizing this type of data would
> probably be something on the order of an hour at a time in five
> minute groups. With a good b-tree ranging index on the date field,
> and the manageable data chunks from splitting it down that way I'd
> get 12 rows for each query that got a particular hour. I'd just
> iterate over the hours, flipping the binding value for the time
> until I got to the end of the data set.
Yes I just replied to another post along similar lines ("paging" by
month), but your idea seems better for the reasons you point out
(keeping the heap down, etc). The only wrinkle is I have about
10k detectors at each time step, so I actually end up with 12k rows
for each hour query, or put an index on both detector id and time (which I
have done), and iterate over both.
> With a 'last updated' timestamp (indexed of course) on the records,
> you can periodically get 'all of the hours with changed data since I
> last checked' and iterate over those hours to replace the summaries
> stored in the denormalized tables to maintain sync.
> And in the end never have a data set on the large heap generated of
> more than 12 rows output, and never ask the database to work with
> the entire data set - just the subset that the b-tree index limits
> it to.
Excellent ideas. Thanks.
> Exciting stuff, working with data sets that large.
Exciting isn't quite the word I'd use after getting not very far not
very fast. But it is fun to max out 8 cores and fill up disks and RAM
that were unobtainable just a few years ago!
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the DBIx-Class