[Catalyst] advise on data processing in Cat/DBIC/Model
Rainer Clasen
bj at zuto.de
Mon Nov 26 15:33:02 GMT 2007
Hello,
within my current project, some value is collected up to once a day:
CREATE TABLE a_value {
day date PRIMARY KEY,
other_values integer NOT NULL,
value integer
another_value integer
);
Data comes in a bit sporadic - so I cannot rely each day having an entry.
Actually there also be longer periods (weeks/month/??) without data.
I'm currently a bit at a loss on how to "properly" cook up this data to
easily display it in fixed time steps. I'm thinking of a list of *all*
days/weeks/month/... in a certain timerange. Such a list would allow the
view easy access to present the data (say as html table with one row per
time step or as input for GD::Graph).
This means there are basically two tasks:
- aggregate the data for each time step: No-brainer with DBIx::Class.
- get NULL entries for time steps without data: The intersting part.
I can come up the following solutions to generate the NULL entries:
- use a SQL stored procedure or temp table with the start-dates of the
desired time-steps, do an outer join and stuff this in a DBIC
result_source as described in the DBIC cookbook under "arbitrary SQL".
example query for ->name():
SELECT
d.id,
steps AS day,
d.value,
COALESCE( d.other_value, $4 ) AS other_value
FROM
timeseries( $1, $2, $3) AS steps
LEFT JOIN ( SELECT * FROM data WHERE other_value = $4 ) d
ON ( d.day >= $2 AND d.day + $1 < $3;
$1 = time steps. eg. '1 day'
$2 = start date. eg. '2007-11-1'
$3 = end date. eg '2007-11-30'
$4 = other_value to filter on.
timeseries(step,start,end) = stored procedure that returns the
start-dates of the time-steps within the specified time-range.
pro:
- also usable without catalyst / DBIC.
- easy data aggregation
- same interface as non-cooked data
con:
- Outer joins tend to be slow.
- the NULL entries have to be transfered from DB->application
- the "arbitrary SQL" example makes me feel a bit sick.
- the "WHERE other_value..." subquery needs to be different in certain
cases.
- Catalyst Model, that creates a DBIC resultset and feeds it to a plain
perl "aggregation module" (which returns the aggregated data in the
desired time steps).
pro:
- aggregation module is quite easy to write
- " " is also usable without catalyst / DBIC
- allows easy tweaking of the resultset
con:
- still a bit tricky to teach the DB about the time-steps for data
aggregation (think of non trivial steps like '4 days').
- alternatively the aggregation has to be done "manually"
- same applies for selecting the actual columns I want to
retrieve/aggregate.
Which do you consider the more elegant way? Am I missing the "obvious" and
therefore trivial solution (as usual)? Am I completely nuts?
Rainer
--
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0 B0E1 0556 E25A 7599 75BD
More information about the Catalyst
mailing list