[Dbix-class] Q: prefetch has_many relation

Rob Kinyon rob.kinyon at gmail.com
Fri Apr 9 19:11:27 GMT 2010


What some of you are missing:

    * prefetch is an optimization that attempts to take what would
have been multiple queries and return it in a single query.
        - This means that allowing the optimization when it's patently
absurd is stupid.

    * The optimization is done by inlining all the columns of the
prefetched table(s) in with the columns of the primary table.
        - This means that, in the case of prefetching a has_many, you
will get M rows of cds per artist. This means that the post-query
analyzer has to collapse the multiple artist rows down to one object
and associate all of its cd rows with it. That's the whole point.

    * If you're inlining the columns of one or more belongs_to, it's
pretty easy to see which row(s) of the prefetched table(s) go with
which row(s) of the primary table.

    * If you're inlining the columns of a single has_many, you have a
1->N explosion. The collapser looks for common values of the primary
key of the primary table, then properly associates all the columns
from the prefetched table with that row.
        - NOTE: This means that if you don't have a primary key on
your primary table, prefetch MAY fail.

    * If you're inlining the columns of a has_many relationship of a
has_many relationship, you have a 1 -> N -> M explosion. BUT, the
explosion is pretty simple to keep organized because M rows of the
second level has_many all associate with the same row of the first
level has_many. (artist -> cds -> tracks, for example).

    * How are you supposed to inline the columns of multuple has_many
relationships on the same level? (artist -> cds and artist ->
paintings). Write out the SQL.
        - Yes, there are solutions. Most of them require a 1 -> M*N
explosion. This explosion is more costly on ALL databases than the
multiple queries prefetch is meant to optimize once M and N each get
to about 5-10 (which is when prefetching is actually useful!).
        - There is one solution which requires a 1 -> MAX(M,N)
explosion, but it's really really hairy to get right and no-one's
paying me to do it. It also requires unions.

I hope this explains some of what's going on.

Rob

Further note:
    * There are times when prefetch is *SLOWER* than the equivalent
multiple queries (MySQL and its crappy temp tables comes to mind).



More information about the DBIx-Class mailing list