[Catalyst] advise on data processing in Cat/DBIC/Model
Toby Corkindale
tjc at wintrmute.net
Mon Nov 26 23:28:10 GMT 2007
On Mon, Nov 26, 2007 at 07:04:24PM +0000, Matt S Trout wrote:
> On Mon, Nov 26, 2007 at 04:33:02PM +0100, Rainer Clasen wrote:
> > Hello,
> >
> > within my current project, some value is collected up to once a day:
> >
> > CREATE TABLE a_value {
> > day date PRIMARY KEY,
> > other_values integer NOT NULL,
> > value integer
> > another_value integer
> > );
> >
> > Data comes in a bit sporadic - so I cannot rely each day having an entry.
> > Actually there also be longer periods (weeks/month/??) without data.
> >
> > I'm currently a bit at a loss on how to "properly" cook up this data to
> > easily display it in fixed time steps. I'm thinking of a list of *all*
> > days/weeks/month/... in a certain timerange. Such a list would allow the
> > view easy access to present the data (say as html table with one row per
> > time step or as input for GD::Graph).
> >
> > This means there are basically two tasks:
> > - aggregate the data for each time step: No-brainer with DBIx::Class.
> > - get NULL entries for time steps without data: The intersting part.
> >
> > I can come up the following solutions to generate the NULL entries:
> >
> > - use a SQL stored procedure or temp table with the start-dates of the
> > desired time-steps, do an outer join and stuff this in a DBIC
> > result_source as described in the DBIC cookbook under "arbitrary SQL".
> >
> > example query for ->name():
> > SELECT
> > d.id,
> > steps AS day,
> > d.value,
> > COALESCE( d.other_value, $4 ) AS other_value
> > FROM
> > timeseries( $1, $2, $3) AS steps
> > LEFT JOIN ( SELECT * FROM data WHERE other_value = $4 ) d
> > ON ( d.day >= $2 AND d.day + $1 < $3;
> > $1 = time steps. eg. '1 day'
> > $2 = start date. eg. '2007-11-1'
> > $3 = end date. eg '2007-11-30'
> > $4 = other_value to filter on.
> > timeseries(step,start,end) = stored procedure that returns the
> > start-dates of the time-steps within the specified time-range.
>
> I tend to do -sort- of this.
>
> Except that instead of using a function like timeseries() I'll create a
> pivot table with a 'date' column that I prepopulated with all dates from
> now to say 2020 (and make sure one of my cron jobs extends this when we
> reach say 2019 or so). Then I put function indexes on the various DATE_PART
> or equivalent functions that I might use to pull the month, year etc.
>
> That way I can query the pivot as "just another DBIC class" and everything
> gets simpler.
I believe that's the method Joe Celko recommends as well.
You can extend it later with public holidays or some other metadata, too.
tjc
More information about the Catalyst
mailing list