[Dbix-class] Use of sub-select in joins

Jon Schutz jon+dbix at youramigo.com
Fri May 4 13:32:21 GMT 2007


I have a problem, and a solution.  I would appreciate comments/advice on
the solution, and then I'd be happy to write up the outcomes for
addition to DBIx::Class::Manual::Joining or thereabouts.

I have a data table which records start/stop events corresponding to
certain items.  The query of interest is to find out, at any particular
time, which items were in a START state.

The formulation of the query is as follows: 

a. find the latest event prior to the given date (using max(date) and
group by event_type) for each item in a sub-select

b. find all rows that match on item and date = max(date)

Example:

mysql> select * from events;
+----+---------+------------+------------+
| id | item_id |  date      | event_type |
+----+---------+------------+------------+
| 37 |      25 | 1178175844 | START      |
| 40 |      25 | 1178175845 | START      |
| 43 |      25 | 1178175850 | STOP       |
| 38 |      26 | 1178175844 | START      |
| 42 |      26 | 1178175846 | START      |
| 44 |      26 | 1178175851 | STOP       |
| 39 |      27 | 1178175845 | START      |
| 41 |      28 | 1178175846 | START      |
+----+---------+------------+------------+

SQL that works:

SELECT * from events t1, 
 (SELECT item_id, max(date) AS mdate 
	FROM events 
	WHERE date <= 1178175846 
	GROUP BY item_id) t2 
WHERE t1.item_id = t2.item_id 
      AND t1.date = mdate
      AND t1.event_type = 'START'

or equivalently, using JOIN,

SELECT * from events t1 
JOIN  (SELECT item_id, max(date) AS mdate 
	FROM events 
	WHERE date <= 1178175846 
	GROUP BY item_id) t2 
ON t1.item_id = t2.item_id 
      AND t1.date = mdate
WHERE t1.event_type = 'START'

to produce:

+----+---------+------------+------------+---------+------------+
| id | item_id | date       | event_type | item_id | mdate      |
+----+---------+------------+------------+---------+------------+
| 40 |      25 | 1178175845 | START      |      25 | 1178175845 |
| 42 |      26 | 1178175846 | START      |      26 | 1178175846 |
| 39 |      27 | 1178175845 | START      |      27 | 1178175845 |
| 41 |      28 | 1178175846 | START      |      28 | 1178175846 |
+----+---------+------------+------------+---------+------------+

To do the sub-select in DBIx::Class, I saw the recent advice on the list
of Marc Mims who posted this:

> You can create a result source for your query:
> 
> package MySchema::Whatever;
> use base qw/DBIx::Class/;
> 
> __PACKAGE__->table('DUMMY');
> __PACKAGE__->add_columns(qw/a b c v/);
> __PACKAGE__->result_source_instance->name(\<<'');
> (select a, b, c, 'var1' as v from table1
> where d='value'
> union select a, b, c, 'var2' as v from table2
> where d='value')

First problem - my date in the sub-select is a variable, so putting this
as a static class declaration won't fly.

Second problem - I'm using deploy() to bootstrap the system, which falls
over when trying to create the table DUMMY.  (By the way, using 
deploy({source_names => \@tables}) doesn't work - I know, I should raise
a bug report...).

So, I took the following approach:

1. Separately (i.e. not in the normal schema path, to avoid it being
picked up by load_classes() and subsequently deploy()), I have the
DateSelector class:

package My::Schema::DateSelector;
use base qw/DBIx::Class/;

__PACKAGE__->load_components(qw/Core/);

__PACKAGE__->table('DUMMY');
__PACKAGE__->add_columns(qw/item_id mdate/);

2. After deploy(), I load the DateSelector class into the schema:
 
   # DateSelector is a dummy table so must bypass the deploy operation
    $self->{schema}->register_class('DateSelector', 'My::Schema::DateSelector');

3. When I need to do the query, I define the sub-select and the join relation:

    # Create table for subquery
    Local::Backlinks::Schema::DateSelector->result_source_instance->name(qq{
        (SELECT item_id, max(date) as mdate
         FROM events
         WHERE \`date\` <=  $date
         GROUP BY item_id)
        });

    my $events = $schema->resultset('Events');
    $events->result_source->add_relationship('selector' => 'My::Schema::DateSelector', {
        'foreign.item_id' => 'self.item_id',
        'foreign.mdate' => 'self.date',
        });

4. And finally, execute the query:

   my @current_items = $events->search({ event_type => 'START' },
                                       { join => 'selector' },
                                       )->related_resultset('item_id')->all;


And it works!  The resulting SQL is basically the JOIN version shown above.

So, it's a bit of mucking around, but worth it in the end, because in
reality I have another 5 joins happening after this one to get to the
final object I want, and DBIx::Class takes care of those transparently
through a series of related_resultset and search_related calls.  I've
left that out of the above to keep the explanation simple.

Is this the best way or can it be done more simply?

Is it safe to be playing with result_source_instance->name and add_relationship dynamically?

TO anyone who has read down this far - Thanks!

-- 

Jon




More information about the Dbix-class mailing list