[Dbix-class] Re: how to use a complex query in DBIx::Class
A. Pagaltzis
pagaltzis at gmx.de
Sun Dec 31 05:05:26 GMT 2006
* Matt LeBlanc <mleblanc at cpan.org> [2006-12-31 03:40]:
> On 12/30/06, A. Pagaltzis <pagaltzis at gmx.de> wrote:
> >You're abusing the ON clause. You're not joining on the userid
> >condition; you're constraing the result set. You should put it
> >in the WHERE clause:
>
> Forgive my ignorance but could you please give me a reason why
> this is an abuse? This is standard ANSI SQL.
How does “abuse” imply it’s not in the standard? Finding out the
length of an array in Perl by saying `$#array + 1` is valid and
working Perl as well, and yet it’s abuse, because the semantics
are off.
> > SELECT
> > b.id,
> > CASE WHEN h.userid IS NOT NULL THEN 1 ELSE 0 END AS hidden
> > FROM boards b
> > LEFT JOIN hiddenboards h ON h.boardid = b.id
> > WHERE
> > b.active = 1
> > AND h.userid = 3
> > ORDER BY b.pos
> >
>
> Did you run this query? Did it return the same results as my
> query? Just in case, I will now present the results for you.
>
> id,hidden
> 2,1
Ah, interesting. I did not realise that the ON clause behaviour
is affected in this way because of the LEFT JOIN, although in
retrospect it is obvious. (Jess didn’t catch this point either,
I notice.) I’m pretty sure that issue would catch out a *lot*
more people glancing at the query than just me and her.
This is what I mean by abuse: the intent is not very explicit in
the original formulation of the query.
> To remove the "abuse" that I had, a simple method would be to
> do something like this:
>
> SELECT
> b.id,
> CASE WHEN h.userid IS NOT NULL THEN 1 ELSE 0 END AS hidden
> FROM boards b
> LEFT JOIN hiddenboards h ON h.boardid = b.id
> WHERE
> b.active = 1 AND
> (h.userid = 3 OR h.userid IS NULL)
> ORDER BY b.pos
>
> This results in:
> id,hidden
> 2,1
> 3,0
>
> Urg, we still don't have the first row. Let's move the where
> condition into the case statement.
Yeah, I was going to point that out before you showed the
results.
> SELECT
> b.id,
> CASE WHEN h.userid = 3 THEN 1 ELSE 0 END AS hidden
> FROM boards b
> LEFT JOIN hiddenboards h ON h.boardid = b.id
> WHERE
> b.active = 1
> ORDER BY b.pos
>
> Eureka!
> id,hidden
> 1,0
> 2,1
> 3,0
>
>
> But wait! userId 3 sets board 1 to hidden as well.
>
> INSERT INTO hiddenboards VALUES(3,1);
>
> id,hidden
> 1,0
> 1,1
> 2,1
> 3,0
Yes, I was going to make this point about the original query but
refrained because as long as the userid is constrained with a
condition it cannot happen. Once you removed the constraint, you
got duplicate rows. This *is* a join, after all; duplicating rows
is what it does.
> >But this is really an abuse of joining, because you're not
> >actually interested in joining data,
>
> As a matter of fact, I am. This is why the left join was used.
Not in the final result. You only want to know *whether* a
matching row for the pair of given userid and examined boardid
exists, but you’re not interested in returning any data from that
row.
> >only in checking, which should be written as a correlated
> >subquery:
>
> Subqueries are generally expensive, depending upon the database
> engine. With the left join, the database can use an index if
> one is available in coming up with all of the rows. The
> subqueries you have would also be able to use the index but one
> at a time as these will involve a scan of the rows returned
> since they are not run until the very end of the joining
> process. In most circumstances, a subquery is the wrong answer.
My experience has been the exact opposite. I’ve rewritten quite a
few queries to use subqueries instead of horribly convoluted
joins (for some inexplicable reason, MySQL people seem to have a
habit of writing these…), and performance has usually benefited
from it, sometimes spectacularly so.
Not that I shun joins; but I’ve found that it’s rarely
problematic for performance when queries are written to reveal
intent and always a boon for maintenance, so I use a subquery
when that’s what expresses intent more closely and a join when
that’s what does.
> > SELECT
> > b.id,
> > EXISTS (
> > SELECT NULL
> > FROM hiddenboards h
> > WHERE
> > h.userid = 3
> > AND h.boardid = b.id
> > ) AS hidden
> > FROM boards b
> > WHERE b.active = 1
> > ORDER BY b.pos
> >
> >Note that I use EXISTS and select nothing but NULL to make it
> >clear that the only fact of interest is whether a matching row
> >exists in `hiddenboards`.
> >
> >In that way it is self-documenting what you're trying to ask
> >for.
>
> If it allows you to read the query more readily, then use it.
> However, you now know that there is a more efficient solution
> that is acceptable to most individuals well-versed in the use
> of SQL.
Programs must be written for people to read, and only
incidentally for machines to execute.
— Abelson & Sussman
I don’t care about efficiency unless it’s a problem.
However, it occured to me that there is no need to use a
correlated subquery in this case anyway. I wrote it that way
because I was still blinkered by the JOIN abuse when I rewrote
that query.
SELECT
id,
id NOT IN (
SELECT boardid
FROM hiddenboards
WHERE userid = 3
) AS hidden
FROM boards
WHERE active = 1
ORDER BY pos
Yes, I tested it; it works correctly. Now go on, tell me that’s
not much simpler and easier to understand than the original
query. Go on, tell me that’s going to be slower than a join.
> No offense, but your "purist" view on SQL is completely odd to
> me.
It’s a purist view on programming; SQL is incidental.
> Why would I not want to use everything the language has to
> offer to return the data I want? Using left joins so you can
> check for null values is a common usage.
For the same reason you’d sometimes use `@array - 1` and
sometimes `$#array`, yet not use them interchangably.
> Would you not use regular expressions if someone told
> you that they were a hack and that you should instead use
> index, rindex, and substr?
Oh, that is an excellent example. Is the following not abuse of
regular expressions?
if( $foo =~ /^.{5}$/s ) { ... }
Why would you write that and not this?
if( 5 == length $foo ) { ... }
Or this, is it not abuse?
if( $foo =~ /^foobar$/ ) { ... }
Why not just say this?
if( $foo eq 'foobar' ) { ... }
People use regular expressions inappropriately quite frequently.
That doesn’t mean regular expressions are a hack. It does mean
that they’re not always the right tool for the job.
Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
More information about the Dbix-class
mailing list