[Dbix-class] Finding missing rows -- JOIN with multiple conditions

Sat Jun 1 23:22:17 GMT 2013

After stumbling around for a while I realize I need some help with building
a query with DBIC.

After writing what follows I found that I want to use a JOIN with multiple
join conditions.

What I'm after is how to write a query in DBIC with a JOIN like what is
bold below.  I can't find anything in the docs that supports this usage
unless I'm not understanding the sub {} syntax for add_relationship()

 (the schema that follows will help this make sense)

    SELECT
         musician_band.musician, band.id as band, album.id as album

    FROM
         musician_band
         LEFT JOIN band ON  musician.band =3D band.id
         LEFT JOIN album ON album.band =3D band.id
         LEFT JOIN track ON track.album =3D album.id* AND  track.songwriter=
 =3D
musician_band.musician*
         LEFT JOIN track_detail ON track_detail.id =3D track.id

    WHERE
         band.id in ( @bands )
             -- NULL thus means there's no track written by the musician on
that album.
            -- or the track was not included on the final album.
         AND (  track.id IS NULL  OR track_detail.is_on_album IS FALSE )

I have the typical music database.  I need to find musicians that played on
albums but where they were NOT the song writer on ANY track on that album.

For example, if a drummer (musician.id =3D 123) never writes songs and play=
ed
in two bands, and each band had three albums I'd want:

musician  |   band  |  album
----------+---------+---------
     123  |     22  |    45
     123  |     22  |    46
     123  |     22  |    47
     123  |     54  |    78
     123  |     54  |    92
     123  |     54  |   103

That musician played on 6 albums yet never was the songwriter for a track
on any of the albums.

My question is what is the best query to use, and how to represent that in
DBIC?

The tables are as you might expect:

A band has many albums, and albums have many tracks.   Each track has just
one songwriter which is a musician.

table musician ( id integer );

table band ( id integer );

table album (
    id integer,
    band integer REFERENCES band
);

table track (
    id integer,
    album integer REFERENCES album,
    songwriter integer REFERENCES musician
);

This can't be too easy so there's a table that manages additional details
about a track -- like if it is included on the final album.  Sorry, it's
just that way.

table track_detail (
    id integer REFERENCES track (id),
    is_on_album boolean default false
);

Of course, musicians are in many bands.  So, there's this many-to-many
table to associate musicians with bands:

table musician_band(
    musician integer REFERENCES musician,
    band integer REFERENCE band
);

Is it possible with just joins?    I thought I read DBIC could do custom
joins now.    This turns out to be *much faster* than the correlated
sub-query below.

That is, LEFT JOIN with tracks *specific to the songwriter* and check for
NULL

    SELECT
         musician_band.musician, musician.band, album.id as album

    FROM
         musician_band
         LEFT JOIN band ON  musician.band =3D band.id
         LEFT JOIN album ON album.band =3D band.id
         LEFT JOIN track ON track.album =3D album.id* AND  track.songwriter=
 =3D
musician_band.musician*
         LEFT JOIN track_detail ON track_detail.id =3D track.id
    WHERE
         band in ( @bands )
         AND (
             track.id IS NULL  -- NULL thus means there's no track written
by the musician on that album.
             OR track_detail.is_on_album IS FALSE
         )

Here, I think, is 'NOT EXISTS ($sub_query)'  correlated sub-query.  But, it
seems to be much slower due to the join on track_detail.

# List of all tracks authored by the musician
my $sub_query =3D $schema->resultset( 'Track' )->search(
    {
        songwriter =3D> { -ident =3D> 'me.musician' },
        album      =3D> { -ident =3D> 'album.id' },
        'track_detail.is_on_album' =3D> 1,
    },
    {
        columns =3D> ['id'],
        alias =3D> 'tracks_authored',
        join =3D> 'track_detail',
    },
);

# List all musicians where they do not have an associated track

my $rs =3D $schema->resultset( 'MusicianBand' )->search(
    {
        'me.band' =3D> { -in =3D> \@bands },
        'NOT EXISTS =3D> $sub_query->as_query,
    },
    {
        select =3D> [qw/ me.musician me.band album.id /],
        as =3D> [qw/ musician band album /],
        join =3D> {
            band =3D> 'album',
        },
    },
);

Wow,   Using the above correlated sub-query I see "Total runtime: 2396.274
ms".  If I remove the join track_detail JOIN in the subquery I get:  "Total
runtime: 0.229 ms"

If instead of the correlated sub-query approach I instead use the complex
JOIN it's then: "Total runtime: 0.357 ms".   Time to look at the query plan.

-- =

Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/dbix-class/attachments/20130601/c09=
0db03/attachment.htm