[Dbix-class] Finding missing rows -- JOIN with multiple conditions
Bill Moseley
moseley at hank.org
Sat Jun 1 23:22:17 GMT 2013
After stumbling around for a while I realize I need some help with building
a query with DBIC.
After writing what follows I found that I want to use a JOIN with multiple
join conditions.
What I'm after is how to write a query in DBIC with a JOIN like what is
bold below. I can't find anything in the docs that supports this usage
unless I'm not understanding the sub {} syntax for add_relationship()
(the schema that follows will help this make sense)
SELECT
musician_band.musician, band.id as band, album.id as album
FROM
musician_band
LEFT JOIN band ON musician.band =3D band.id
LEFT JOIN album ON album.band =3D band.id
LEFT JOIN track ON track.album =3D album.id* AND track.songwriter=
=3D
musician_band.musician*
LEFT JOIN track_detail ON track_detail.id =3D track.id
WHERE
band.id in ( @bands )
-- NULL thus means there's no track written by the musician on
that album.
-- or the track was not included on the final album.
AND ( track.id IS NULL OR track_detail.is_on_album IS FALSE )
I have the typical music database. I need to find musicians that played on
albums but where they were NOT the song writer on ANY track on that album.
For example, if a drummer (musician.id =3D 123) never writes songs and play=
ed
in two bands, and each band had three albums I'd want:
musician | band | album
----------+---------+---------
123 | 22 | 45
123 | 22 | 46
123 | 22 | 47
123 | 54 | 78
123 | 54 | 92
123 | 54 | 103
That musician played on 6 albums yet never was the songwriter for a track
on any of the albums.
My question is what is the best query to use, and how to represent that in
DBIC?
The tables are as you might expect:
A band has many albums, and albums have many tracks. Each track has just
one songwriter which is a musician.
table musician ( id integer );
table band ( id integer );
table album (
id integer,
band integer REFERENCES band
);
table track (
id integer,
album integer REFERENCES album,
songwriter integer REFERENCES musician
);
This can't be too easy so there's a table that manages additional details
about a track -- like if it is included on the final album. Sorry, it's
just that way.
table track_detail (
id integer REFERENCES track (id),
is_on_album boolean default false
);
Of course, musicians are in many bands. So, there's this many-to-many
table to associate musicians with bands:
table musician_band(
musician integer REFERENCES musician,
band integer REFERENCE band
);
Is it possible with just joins? I thought I read DBIC could do custom
joins now. This turns out to be *much faster* than the correlated
sub-query below.
That is, LEFT JOIN with tracks *specific to the songwriter* and check for
NULL
SELECT
musician_band.musician, musician.band, album.id as album
FROM
musician_band
LEFT JOIN band ON musician.band =3D band.id
LEFT JOIN album ON album.band =3D band.id
LEFT JOIN track ON track.album =3D album.id* AND track.songwriter=
=3D
musician_band.musician*
LEFT JOIN track_detail ON track_detail.id =3D track.id
WHERE
band in ( @bands )
AND (
track.id IS NULL -- NULL thus means there's no track written
by the musician on that album.
OR track_detail.is_on_album IS FALSE
)
Here, I think, is 'NOT EXISTS ($sub_query)' correlated sub-query. But, it
seems to be much slower due to the join on track_detail.
# List of all tracks authored by the musician
my $sub_query =3D $schema->resultset( 'Track' )->search(
{
songwriter =3D> { -ident =3D> 'me.musician' },
album =3D> { -ident =3D> 'album.id' },
'track_detail.is_on_album' =3D> 1,
},
{
columns =3D> ['id'],
alias =3D> 'tracks_authored',
join =3D> 'track_detail',
},
);
# List all musicians where they do not have an associated track
my $rs =3D $schema->resultset( 'MusicianBand' )->search(
{
'me.band' =3D> { -in =3D> \@bands },
'NOT EXISTS =3D> $sub_query->as_query,
},
{
select =3D> [qw/ me.musician me.band album.id /],
as =3D> [qw/ musician band album /],
join =3D> {
band =3D> 'album',
},
},
);
Wow, Using the above correlated sub-query I see "Total runtime: 2396.274
ms". If I remove the join track_detail JOIN in the subquery I get: "Total
runtime: 0.229 ms"
If instead of the correlated sub-query approach I instead use the complex
JOIN it's then: "Total runtime: 0.357 ms". Time to look at the query plan.
-- =
Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/dbix-class/attachments/20130601/c09=
0db03/attachment.htm
More information about the DBIx-Class
mailing list