[Dbix-class] Re: Bug in prefetch subqueries?

Thu Jul 30 12:17:38 GMT 2009

Rob Kinyon wrote:
> ribasushi -
> 
>     In a DBIC list message, you said the following:
> 
>> [1] The reason the double WHERE is needed is as follows:
>>
>> SELECT artist.*, cd.*
>>    FROM (
>>        SELECT artist.*
>>            FROM artist
>>            JOIN cd ON <some condition>
>>        WHERE XXX
>>    ) artist
>>    JOIN cd ON <some condition>
>> WHERE XXX
>>
>> In the above XXX could contain
>> 1) only artist column restrictions
>> 2) only cd column restrictions
>> 3) both
>>
>> If I could reliable determine this I would (in each of the cases):
>>
>> 1) Keep only the inner WHERE
>> 2) Keep only the outer WHERE (and maybe even remove the inner join
>> if it was of type LEFT)
>> 3) Separate the cd from the artist conditions and apply th where on
>> the relevant sides
> 
> I have found what seems to be a bug. Basically, it comes down to this:
> 
> artists:
> { id => 1, name => 'Metallica' }
> { id => 2, name => 'Linkin Park' }
> 
> cds:
> { id => 1, artist_id => 1, name => 'Fade to Black' }
> { id => 2, artist_id => 1, name => 'Kill em All' }
> { id => 3, artist_id => 2, name => 'Something' }
> 
> my $artist = $artist->search({ 'cd.name' => 'Fade to Black' }, { join
> => 'cd' })->first;
> 
> my $artist_pref = $artist->search({ 'cd.col1' => 'Fade to Black' }, {
> join => 'cd', prefetch => 'cd' })->first;
> 
> So, with $artist, when I do $artist->cds, I get two results back.
> $artist_pref->cds will only return 1 result given the SQL you put in
> the message to the list. Is this desired behavior? I would think that
> my search is asking for all the artists who have a CD with a name of
> 'Fade to Black'. Then, when I do $artist->cds, I'm asking for all the
> CDs ever released by that artist. How I got $artist is now irrelevant.
> 
> $foo->bar should return identical results with or without prefetch -
> it should just work faster with prefetch.
> 

If $foo was a resultset - I would agree. However in row context this
isn't the case. You use the variable $artist for both resultsets and
rows above (also you use .name and .col1 which makes no sense), so I
will try to rewrite what I think you are trying to say:

We start with:

$f2b_artist_rs = $schema->resultset ('Artist')
		->search (
		  { 'cd.name' => 'Fade to Black' },
		  { join => 'cd' },
		);

The count:
$f2b_artist_rs->search_related ('cd')->count
should be 1, as the (only) query will contain the cd.name condition.

The same count will be available via:
$f2b_artist_rs->search ({}, { prefetch => 'cd' })
		->search_related ('cd')
		  ->count
as there is still only 1 query.

Now if we get an artist, all we have is the artist id, and we lose
all kinds of conditions. I.e.:

$f2b_artist_rs->first->cds->count will return 2, as there will be 2
queries executed:
1) get me the first artist that has a cd named f2b
2) get me the count of all cds which have a FK equal to the artists
PK

And now we come to your semi-bug which reads:
$f2b_artist_rs->search ({}, { prefetch => 'cd' })
		->first
		  ->cds
		    ->count

The reason this returns 1, is because we execute only *one* query
(the whole point of prefetch => ). The result of the query is a
single row of artist/cd data, which is used to pre-populate all the
slots for ->has_many_accessor. In order to have the above return 2
we would have to re-query the database, effectively nullifying the
effect of prefetch. Now, if you are hell-bent on making this
correct and still maintain the prefetch == single query paradigm,
you would have to alter the prefetch mechanism to detect if there
are *any* right-side join conditions, and if this is the case do
something like:

SELECT artist.*, cd2.*
  FROM artist artist
  LEFT JOIN cd cd ON artist.id = cd.artistid
  LEFT JOIN cd cd2 ON artist.id = cd2.artistid
WHERE cd.name = ?

Effectively we pull an additional "shadow" join that will be
related to the artist rs limited by the initial join, and which
will be used for the actual object inflation. Of course in order to
make this happen we need both introspectable SQLA, and sane multi-
has-many collapse (actually the collapse may not be ncessary, as we
do not select anything from the first join). And even if we did, I
am still not sure if going to such great lengths is in fact
justified. Prefetched rs == non-prefetched rs is a flawed concept
imo, as it is not quite compatible with prefetch == one query only,
which as far as I am concerned *is* the selling point of prefetch
(db-trip latencies are the most significant bottleneck I have
encountered during my work).

Thoughts?