[Dbix-class] Re: how to use a complex query in DBIx::Class

Matt LeBlanc mleblanc at cpan.org
Sun Dec 31 02:29:27 GMT 2006


On 12/30/06, A. Pagaltzis <pagaltzis at gmx.de> wrote:
> * Matt LeBlanc <mleblanc at cpan.org> [2006-12-30 08:10]:
> > Sorry, I just realized I used the wrong field. Of course, this
> > doesn't matter as if the join condition fails, anything from
> > hiddenboards is null. Here is a full example of how this would
> > work:
> >
> > CREATE TABLE boards (
> >       id int,
> >       active int,
> >       pos int
> > );
> >
> > INSERT INTO boards VALUES(1,1,1);
> > INSERT INTO boards VALUES(2,1,2);
> > INSERT INTO boards VALUES(3,1,3);
> >
> > CREATE TABLE hiddenboards (
> >       userid int,
> >       boardId int
> > );
> >
> > INSERT INTO hiddenboards VALUES(1,1); -- not userid 3, so no join
> > INSERT INTO hiddenboards VALUES(3,2); -- userid 3, so valid join
> > -- no entry for id 3 so no join
> >
> > SELECT boards.id,CASE WHEN hiddenboards.userid is not null THEN 1 ELSE 0 END AS hidden
> > FROM boards LEFT JOIN hiddenboards
> > ON hiddenboards.userid = 3
> > AND hiddenboards.boardid = boards.id
> > WHERE boards.active = 1
> > ORDER BY pos;
> >
> > In this particular case, the rows returned are as follows:
> >
> > id,hidden
> > 1,0
> > 2,1
> > 3,0
>
> You're abusing the ON clause. You're not joining on the userid
> condition; you're constraing the result set. You should put it in
> the WHERE clause:

Forgive my ignorance but could you please give me a reason why this is
an abuse? This is standard ANSI SQL.

>
>     SELECT
>         b.id,
>         CASE WHEN h.userid IS NOT NULL THEN 1 ELSE 0 END AS hidden
>     FROM boards b
>     LEFT JOIN hiddenboards h ON h.boardid = b.id
>     WHERE
>         b.active = 1
>         AND h.userid = 3
>     ORDER BY b.pos
>

Did you run this query? Did it return the same results as my query?
Just in case, I will now present the results for you.

id,hidden
2,1

As you can see, these are not the same results. hiddenboards.userId is
checked in the join condition because it is used to specify when I
wish to join. If you put it in the where clause, we join on rows where
we don't want the join to occur. For instance, if you remove the where
condition, we will get all of the rows but we will also mistakenly get
the first board shown as hidden. But since we do have the
hiddenboards.userId = 3 condition in the where clause, both the first
row, which has a hiddenboards.userId = 1, and the third row, which has
a null hiddenboards.userId, are filtered out. This is what the where
expression does. It filters results after all of our joins have been
completed. To remove the "abuse" that I had, a simple method would be
to do something like this:

SELECT
     b.id,
     CASE WHEN h.userid IS NOT NULL THEN 1 ELSE 0 END AS hidden
FROM boards b
    LEFT JOIN hiddenboards h ON h.boardid = b.id
WHERE
     b.active = 1 AND
     (h.userid = 3 OR h.userid IS NULL)
ORDER BY b.pos

This results in:
id,hidden
2,1
3,0

Urg, we still don't have the first row. Let's move the where condition
into the case statement.

SELECT
     b.id,
     CASE WHEN h.userid = 3 THEN 1 ELSE 0 END AS hidden
FROM boards b
    LEFT JOIN hiddenboards h ON h.boardid = b.id
WHERE
     b.active = 1
ORDER BY b.pos

Eureka!
id,hidden
1,0
2,1
3,0


But wait! userId 3 sets board 1 to hidden as well.

INSERT INTO hiddenboards VALUES(3,1);

id,hidden
1,0
1,1
2,1
3,0

I guess the easiest solution is to "abuse" left joins or go with an
expensive subquery.

To ward off a possible argument that my solution provides duplicate
results if hiddenboards has two rows where userid is 3 and the boardid
is the same, I doubt this is how this table would be used since it
would seem that the purpose of this table is to say for whom the board
is hidden. If I am wrong about this, I invite Wan, the original
poster, to correct me.

> But this is really an abuse of joining, because you're not
> actually interested in joining data,

As a matter of fact, I am. This is why the left join was used.

> only in checking, which
> should be written as a correlated subquery:
>

Subqueries are generally expensive, depending upon the database
engine. With the left join, the database can use an index if one is
available in coming up with all of the rows. The subqueries you have
would also be able to use the index but one at a time as these will
involve a scan of the rows returned since they are not run until the
very end of the joining process. In most circumstances, a subquery is
the wrong answer.

>     SELECT
>         b.id,
>         EXISTS (
>             SELECT NULL
>             FROM hiddenboards h
>             WHERE
>                 h.userid = 3
>                 AND h.boardid = b.id
>         ) AS hidden
>     FROM boards b
>     WHERE b.active = 1
>     ORDER BY b.pos
>
> Note that I use EXISTS and select nothing but NULL to make it
> clear that the only fact of interest is whether a matching row
> exists in `hiddenboards`.
>
> In that way it is self-documenting what you're trying to ask for.

If it allows you to read the query more readily, then use it. However,
you now know that there is a more efficient solution that is
acceptable to most individuals well-versed in the use of SQL.

>
> Regards,
> --
> Aristotle Pagaltzis // <http://plasmasturm.org/>

No offense, but your "purist" view on SQL is completely odd to me. Why
would I not want to use everything the language has to offer to return
the data I want? Using left joins so you can check for null values is
a common usage. Would you not use regular expressions if someone told
you that they were a hack and that you should instead use index,
rindex, and substr?



More information about the Dbix-class mailing list