[Dbix-class] trying to figure out why DBIc query is slow

Thu Nov 8 16:55:03 GMT 2012

Aaron,

Thanks for your reply.  There are differences between the queries, but I
have confirmed multiple times that using psql or a DBI script, both queries
return in under a second (with the same date ranges).  The only time the
second query takes 10 times longer is with the DBIc code.  Here are the
EXPLAINs:

FAST:

QUERY
PLAN

---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
----------------------------------------------------------
 Merge Join  (cost=3D0.00..2810.82 rows=3D79 width=3D58) (actual
time=3D0.118..1.374 rows=3D576 loops=3D1)
   Merge Cond: (me.sample_time =3D myself.sample_time)
   ->  Index Scan using datasample_pkey on datasample me
(cost=3D0.00..1404.13 rows=3D355 width=3D50) (actual time=3D0.071..0.330 ro=
ws=3D576
loops=3D1)
         Index Cond: ((dataset_id =3D 3165) AND (sample_time >=3D '2012-10-=
01
00:00:00-05'::timestamp with time zone) AND (sample_time <=3D '2012-11-03
23:59:00-05'::timestamp with time zone))
   ->  Index Scan using datasample_pkey on datasample myself
(cost=3D0.00..1404.13 rows=3D355 width=3D16) (actual time=3D0.041..0.326 ro=
ws=3D576
loops=3D1)
         Index Cond: ((myself.dataset_id =3D 3166) AND (myself.sample_time =
>=3D
'2012-10-01 00:00:00-05'::timestamp with time zone) AND (myself.sample_time
<=3D '2012-11-03 23:59:00-05'::timestamp with time zone))
 Total runtime: 1.558 ms

SLOW:

QUERY
PLAN

---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------
 Sort  (cost=3D12191.02..12191.02 rows=3D1 width=3D58) (actual
time=3D17.848..17.975 rows=3D576 loops=3D1)
   Sort Key: me.sample_time
   Sort Method:  quicksort  Memory: 93kB
   ->  Nested Loop  (cost=3D7247.68..12191.01 rows=3D1 width=3D58) (actual
time=3D0.557..17.386 rows=3D576 loops=3D1)
         ->  Nested Loop  (cost=3D7247.68..11555.00 rows=3D65 width=3D24) (=
actual
time=3D0.544..14.342 rows=3D576 loops=3D1)
               ->  Index Scan using dataset_server_id_key on dataset
(cost=3D0.00..4306.01 rows=3D2 width=3D8) (actual time=3D0.331..5.006 rows=
=3D1
loops=3D1)
                     Index Cond: ((device_id =3D 659) AND ((name)::text =3D
'BytT'::text))
                     Filter: ((label)::text ~~ '%[1]%'::text)
               ->  Materialize  (cost=3D7247.68..7248.01 rows=3D33 width=3D=
16)
(actual time=3D0.211..9.060 rows=3D576 loops=3D1)
                     ->  Nested Loop  (cost=3D0.00..7247.64 rows=3D33 width=
=3D16)
(actual time=3D0.207..8.567 rows=3D576 loops=3D1)
                           ->  Index Scan using dataset_server_id_key on
dataset dataset_2  (cost=3D0.00..4307.77 rows=3D2 width=3D8) (actual
time=3D0.159..7.892 rows=3D1 loops=3D1)
                                 Index Cond: ((device_id =3D 659) AND
((name)::text =3D 'BytR'::text))
                                 Filter: ((label)::text ~~ '%[1]%'::text)
                           ->  Index Scan using datasample_pkey on
datasample myself  (cost=3D0.00..1465.30 rows=3D371 width=3D24) (actual
time=3D0.045..0.381 rows=3D576 loops=3D1)
                                 Index Cond: ((myself.dataset_id =3D
dataset_2.dataset_id) AND (myself.sample_time >=3D '2012-10-01
00:00:00-05'::timestamp with time zone) AND (myself.sample_time <=3D
'2012-11-03 23:59:00-05'::timestamp with time zone))
         ->  Index Scan using datasample_pkey on datasample me
(cost=3D0.00..9.77 rows=3D1 width=3D50) (actual time=3D0.003..0.004 rows=3D1
loops=3D576)
               Index Cond: ((me.dataset_id =3D dataset.dataset_id) AND
(me.sample_time >=3D '2012-10-01 00:00:00-05'::timestamp with time zone) AND
(me.sample_time <=3D '2012-11-03 23:59:00-05'::timestamp with time zone) AND
(me.sample_time =3D myself.sample_time))
 Total runtime: 18.275 ms

On Thu, Nov 8, 2012 at 10:24 AM, Aaron Crane <dbix-class at aaroncrane.co.uk>w=
rote:

> Frank Sheiness <frank at korcett.com> wrote:
> > I have two postgres queries that return the same data set.  One has some
> > hard coded values, the other uses some joins  to figure out which rows =
to
> > get.  Both queries run very quickly via the psql client and a DBI scrip=
t.
> > Using DBIc, the latter query runs very slowly.  I've tried using
> > HashReinflator but it's still very slow.  Here are the two queries:
> >
> > SELECT me.dataset_id, me.sample_time, me.sample_mean, me.sample_stdv,
> > me.sample_min, me.sample_max, me.sample_size, myself.sample_mean FROM
> > datasample me  JOIN datasample myself ON myself.sample_time =3D
> me.sample_time
> > WHERE ( ( ( me.sample_time >=3D ? AND me.sample_time <=3D ? AND
> > myself.sample_time >=3D ? AND myself.sample_time <=3D ? ) AND me.datase=
t_id
> =3D ?
> > AND myself.dataset_id =3D ? ) ) ORDER BY me.sample_time: '2012-10-01',
> > '2012-11-06 23:59', '2012-10-01', '2012-11-06 23:59', '3165', '3166'
> >
> > SELECT me.dataset_id, me.sample_time, me.sample_mean, me.sample_stdv,
> > me.sample_min, me.sample_max, me.sample_size, myself.sample_mean FROM
> > datasample me  JOIN dataset dataset ON dataset.dataset_id =3D me.datase=
t_id
> > JOIN datasample myself ON myself.sample_time =3D me.sample_time  JOIN
> dataset
> > dataset_2 ON dataset_2.dataset_id =3D myself.dataset_id WHERE ( ( (
> > me.sample_time >=3D ? AND me.sample_time <=3D ? AND myself.sample_time =
>=3D ?
> AND
> > myself.sample_time <=3D ? ) AND dataset.device_id =3D ? AND dataset.lab=
el
> LIKE ?
> > AND dataset.name =3D ? AND dataset_2.device_id =3D ? AND dataset_2.label
> LIKE ?
> > AND dataset_2.name =3D ? ) ) ORDER BY me.sample_time: '2012-10-01',
> > '2012-11-07 23:59', '2012-10-01', '2012-11-07 23:59', '659', '%[1]%',
> > 'BytT', '659', '%[1]%', 'BytR'
>
> It's not clear to me that those two queries should have comparable
> performance.  I see three differences between them:
>
> - The slow query considers a time period one day longer than the fast
> query's
> - The slow query joins four tables, where the fast query joins only two
> - The slow query contains two separate `LIKE '%[1]%'` clauses
>
> My guess is that the last point is the biggest cause of the
> performance difference you're seeing.  In practice, databases can't
> readily optimise LIKE comparisons, and even in the presence of a index
> on the relevant column, a LIKE pattern that begins with a wildcard is
> almost certain to require at least a partial table scan.  In this
> case, if you don't also have a suitable index on dataset.device_id
> and/or dataset.name, you'll probably get a full table scan.
>
> I'm aware that you say the two queries have comparable performance
> when run from the psql client, but could I ask you to double-check?
>
> You may also find Postgres's EXPLAIN useful.
>
> --
> Aaron Crane ** http://aaroncrane.co.uk/
>
> _______________________________________________
> List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
> IRC: irc.perl.org#dbix-class
> SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
> Searchable Archive:
> http://www.grokbase.com/group/dbix-class@lists.scsys.co.uk
>

-- =

   Frank Sheiness - VP of Software Development - Korcett Holdings, Inc.
   frank.sheiness at korcett.com   aim:txrrfrank   http://www.korcett.com/
   [ 512-419-7419 x 205 (o) ] [ 512-417-8825 (c) ] [ 512-419-7680 (f) ]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/dbix-class/attachments/20121108/cae=
898f7/attachment.htm