[Catalyst] advise on data processing in Cat/DBIC/Model

Mon Nov 26 19:04:24 GMT 2007

On Mon, Nov 26, 2007 at 04:33:02PM +0100, Rainer Clasen wrote:
> Hello,
> 
> within my current project, some value is collected up to once a day:
> 
>  CREATE TABLE a_value {
>   day date PRIMARY KEY,
>   other_values integer NOT NULL,
>   value integer
>   another_value integer
>  );
> 
> Data comes in a bit sporadic - so I cannot rely each day having an entry.
> Actually there also be longer periods (weeks/month/??) without data.
> 
> I'm currently a bit at a loss on how to "properly" cook up this data to
> easily display it in fixed time steps. I'm thinking of a list of *all*
> days/weeks/month/... in a certain timerange. Such a list would allow the
> view easy access to present the data (say as html table with one row per
> time step or as input for GD::Graph).
> 
> This means there are basically two tasks:
> - aggregate the data for each time step: No-brainer with DBIx::Class.
> - get NULL entries for time steps without data: The intersting part.
> 
> I can come up the following solutions to generate the NULL entries:
> 
> - use a SQL stored procedure or temp table with the start-dates of the
>   desired time-steps, do an outer join and stuff this in a DBIC
>   result_source as described in the DBIC cookbook under "arbitrary SQL".
> 
>   example query for ->name():
> 	SELECT
> 	 d.id,
> 	 steps AS day,
> 	 d.value,
> 	 COALESCE( d.other_value, $4 ) AS other_value
> 	FROM
> 	 timeseries( $1, $2, $3) AS steps
> 	 LEFT JOIN ( SELECT * FROM data WHERE other_value = $4 ) d
> 	  ON ( d.day >= $2 AND d.day + $1 < $3;
>   $1 = time steps. eg. '1 day'
>   $2 = start date. eg. '2007-11-1'
>   $3 = end date. eg '2007-11-30'
>   $4 = other_value to filter on.
>   timeseries(step,start,end) = stored procedure that returns the 
> 	start-dates of the time-steps within the specified time-range.

I tend to do -sort- of this.

Except that instead of using a function like timeseries() I'll create a
pivot table with a 'date' column that I prepopulated with all dates from
now to say 2020 (and make sure one of my cron jobs extends this when we
reach say 2019 or so). Then I put function indexes on the various DATE_PART
or equivalent functions that I might use to pull the month, year etc.

That way I can query the pivot as "just another DBIC class" and everything
gets simpler.

-- 
      Matt S Trout       Catalyst and DBIx::Class consulting and support -
   Technical Director      http://www.shadowcat.co.uk/catalyst/
 Shadowcat Systems Ltd.  Christmas fun in collectable card game form -
                           http://www.shadowcat.co.uk/resources/2007_trading/