[Dbix-class] Re: how to use a complex query in DBIx::Class

A. Pagaltzis pagaltzis at gmx.de
Sun Dec 31 05:05:26 GMT 2006


* Matt LeBlanc <mleblanc at cpan.org> [2006-12-31 03:40]:
> On 12/30/06, A. Pagaltzis <pagaltzis at gmx.de> wrote:
> >You're abusing the ON clause. You're not joining on the userid
> >condition; you're constraing the result set. You should put it
> >in the WHERE clause:
> 
> Forgive my ignorance but could you please give me a reason why
> this is an abuse? This is standard ANSI SQL.

How does “abuse” imply it’s not in the standard? Finding out the
length of an array in Perl by saying `$#array + 1` is valid and
working Perl as well, and yet it’s abuse, because the semantics
are off.

> >    SELECT
> >        b.id,
> >        CASE WHEN h.userid IS NOT NULL THEN 1 ELSE 0 END AS hidden
> >    FROM boards b
> >    LEFT JOIN hiddenboards h ON h.boardid = b.id
> >    WHERE
> >        b.active = 1
> >        AND h.userid = 3
> >    ORDER BY b.pos
> >
> 
> Did you run this query? Did it return the same results as my
> query? Just in case, I will now present the results for you.
> 
> id,hidden
> 2,1

Ah, interesting. I did not realise that the ON clause behaviour
is affected in this way because of the LEFT JOIN, although in
retrospect it is obvious. (Jess didn’t catch this point either,
I notice.) I’m pretty sure that issue would catch out a *lot*
more people glancing at the query than just me and her.

This is what I mean by abuse: the intent is not very explicit in
the original formulation of the query.

> To remove the "abuse" that I had, a simple method would be to
> do something like this:
> 
> SELECT
>     b.id,
>     CASE WHEN h.userid IS NOT NULL THEN 1 ELSE 0 END AS hidden
> FROM boards b
>    LEFT JOIN hiddenboards h ON h.boardid = b.id
> WHERE
>     b.active = 1 AND
>     (h.userid = 3 OR h.userid IS NULL)
> ORDER BY b.pos
> 
> This results in:
> id,hidden
> 2,1
> 3,0
> 
> Urg, we still don't have the first row. Let's move the where
> condition into the case statement.

Yeah, I was going to point that out before you showed the
results.

> SELECT
>     b.id,
>     CASE WHEN h.userid = 3 THEN 1 ELSE 0 END AS hidden
> FROM boards b
>    LEFT JOIN hiddenboards h ON h.boardid = b.id
> WHERE
>     b.active = 1
> ORDER BY b.pos
> 
> Eureka!
> id,hidden
> 1,0
> 2,1
> 3,0
> 
> 
> But wait! userId 3 sets board 1 to hidden as well.
> 
> INSERT INTO hiddenboards VALUES(3,1);
> 
> id,hidden
> 1,0
> 1,1
> 2,1
> 3,0

Yes, I was going to make this point about the original query but
refrained because as long as the userid is constrained with a
condition it cannot happen. Once you removed the constraint, you
got duplicate rows. This *is* a join, after all; duplicating rows
is what it does.

> >But this is really an abuse of joining, because you're not
> >actually interested in joining data,
> 
> As a matter of fact, I am. This is why the left join was used.

Not in the final result. You only want to know *whether* a
matching row for the pair of given userid and examined boardid
exists, but you’re not interested in returning any data from that
row.

> >only in checking, which should be written as a correlated
> >subquery:
> 
> Subqueries are generally expensive, depending upon the database
> engine. With the left join, the database can use an index if
> one is available in coming up with all of the rows. The
> subqueries you have would also be able to use the index but one
> at a time as these will involve a scan of the rows returned
> since they are not run until the very end of the joining
> process. In most circumstances, a subquery is the wrong answer.

My experience has been the exact opposite. I’ve rewritten quite a
few queries to use subqueries instead of horribly convoluted
joins (for some inexplicable reason, MySQL people seem to have a
habit of writing these…), and performance has usually benefited
from it, sometimes spectacularly so.

Not that I shun joins; but I’ve found that it’s rarely
problematic for performance when queries are written to reveal
intent and always a boon for maintenance, so I use a subquery
when that’s what expresses intent more closely and a join when
that’s what does.

> >    SELECT
> >        b.id,
> >        EXISTS (
> >            SELECT NULL
> >            FROM hiddenboards h
> >            WHERE
> >                h.userid = 3
> >                AND h.boardid = b.id
> >        ) AS hidden
> >    FROM boards b
> >    WHERE b.active = 1
> >    ORDER BY b.pos
> >
> >Note that I use EXISTS and select nothing but NULL to make it
> >clear that the only fact of interest is whether a matching row
> >exists in `hiddenboards`.
> >
> >In that way it is self-documenting what you're trying to ask
> >for.
> 
> If it allows you to read the query more readily, then use it.
> However, you now know that there is a more efficient solution
> that is acceptable to most individuals well-versed in the use
> of SQL.

    Programs must be written for people to read, and only
    incidentally for machines to execute.
                                      — Abelson & Sussman

I don’t care about efficiency unless it’s a problem.

However, it occured to me that there is no need to use a
correlated subquery in this case anyway. I wrote it that way
because I was still blinkered by the JOIN abuse when I rewrote
that query.

    SELECT
        id,
        id NOT IN (
            SELECT boardid
            FROM hiddenboards
            WHERE userid = 3
        ) AS hidden
    FROM boards
    WHERE active = 1
    ORDER BY pos

Yes, I tested it; it works correctly. Now go on, tell me that’s
not much simpler and easier to understand than the original
query. Go on, tell me that’s going to be slower than a join.

> No offense, but your "purist" view on SQL is completely odd to
> me.

It’s a purist view on programming; SQL is incidental.

> Why would I not want to use everything the language has to
> offer to return the data I want? Using left joins so you can
> check for null values is a common usage.

For the same reason you’d sometimes use `@array - 1` and
sometimes `$#array`, yet not use them interchangably.

> Would you not use regular expressions if someone told
> you that they were a hack and that you should instead use
> index, rindex, and substr?

Oh, that is an excellent example. Is the following not abuse of
regular expressions?

    if( $foo =~ /^.{5}$/s ) { ... }

Why would you write that and not this?

    if( 5 == length $foo ) { ... }

Or this, is it not abuse?

    if( $foo =~ /^foobar$/ ) { ... }

Why not just say this?

    if( $foo eq 'foobar' ) { ... }

People use regular expressions inappropriately quite frequently.
That doesn’t mean regular expressions are a hack. It does mean
that they’re not always the right tool for the job.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>



More information about the Dbix-class mailing list