[Dbix-class] Why is prefetch so slow?

Tue Oct 16 08:51:27 GMT 2007

On Fri, 2007-10-12 at 17:01 +0930, Jon Schutz wrote:
> In my code I have a DBIx::Class query with prefetch that is taking about
> 1.2 seconds to execute.  If I remove the prefetch, it takes 0.2 seconds.
> The actual SQL query takes 0.00 seconds with or without the join.  The
> number of records is the same with and without the prefetch/join.
> 
> I appreciate the DBIx::Class is building additional objects on prefetch,
> but surely that can't have that great an overhead!
> 
> Using '+select' instead of prefetch adds much less extra time (but
> screws up the otherwise clean object model in the code).
> 
> To demonstrate, I have pared down the code and put together a test case
> (attached), and run it on various servers.  It is a very simple schema.
> The code with prefetch takes about 4 times longer to run.  The
> equivalent code using DBI has almost constant run time with/without the
> join.
> 
> Sample results:
> 
> # Server A
> Operation with prefetch took 0.146116 secs
> Operation without prefetch took 0.035811 secs
> Repeat operation with prefetch took 0.15038 secs
> DBI operation with prefetch took 0.000979 secs
> DBI operation without prefetch took 0.000793 secs
> Repeat DBI operation with prefetch took 0.000762 secs
> 
> # Server B
> Operation with prefetch took 0.085375 secs
> Operation without prefetch took 0.0166269999999999 secs
> Repeat operation with prefetch took 0.082367 secs
> DBI operation with prefetch took 0.001724 secs
> DBI operation without prefetch took 0.001696 secs
> Repeat DBI operation with prefetch took 0.001644 secs
> 
> # Server C
> Operation with prefetch took 0.073623 secs
> Operation without prefetch took 0.014567 secs
> Repeat operation with prefetch took 0.070953 secs
> DBI operation with prefetch took 0.001508 secs
> DBI operation without prefetch took 0.001418 secs
> Repeat DBI operation with prefetch took 0.001377 secs
> 
> # Server D
> Operation with prefetch took 0.39944 secs
> Operation without prefetch took 0.114976 secs
> Repeat operation with prefetch took 0.422341 secs
> DBI operation with prefetch took 0.001064 secs
> DBI operation without prefetch took 0.000984 secs
> Repeat DBI operation with prefetch took 0.000919 secs
> 
> Reliably, the 'without prefetch' time is about 4 times the 'with
> prefetch' times for DBIx::Class, but almost equal for DBI.
> 
> Servers are running perl 5.8.8, DBIx::Class 0.08007, Linux (CentOS 4 and 5), MySQL 4 and 5.
> 
> Timings of DBI vs DBIx::Class are not a concern; for obvious reasons we
> expect a difference there.
> 
> Is anyone aware of this performance issue, or, even better, does anyone
> know how to fix it???

Having spent a good part of the day with a profiler and trace
statements, I believe I have identified the major source of the problem
- 'use overload' in DBIx::Class::ResultSourceHandle.

Out of the box DBIx::Class:

  Operation with prefetch took 0.313716 secs 
  Operation without prefetch took 0.095531 secs

Results with 'use overload' commented out:

  Operation with prefetch took 0.068018 secs
  Operation without prefetch took 0.014182 secs

i.e. it's 5 TIMES FASTER without the overload.  (Ratio of prefetch to
without prefetch is not much different, but that's OK as 5x faster is
much more acceptable).

It's not the 'use overload' per se, but any function overloading; i.e.
commenting out all three of the following lines from
DBIx::Class::ResultSourceHandle, or just the second line, achieves the
same improvement.

use overload
    q/""/ => sub { __PACKAGE__ . ":" . shift->source_moniker; },
    fallback => 1;

To inflate 100 rows with prefetch (so, 200 objects in this case), causes
404 invocations of DBIx::Class::ResultSourceHandle::new.  With the
overload, the cost is about 0.0007s per call, so about 0.3 seconds total
(this is the time with the profiler enabled, which adds its own cost).
Without overload, the time is 0.000015s.

I see that overload is used in a number of other modules - ResultSet,
Exception, Storage, CDBICompat/Stringify.pm.  I haven't investigated the
impact of those, but presumably the impact is less significant as those
operations are not called as frequently as
DBIx::Class::ResultSourceHandle::new.

So, what would be the implications of removing the overloaded string
operator?  If overload is removed, the only test that fails is
84serialize.t.  It seems the main purpose of the overload is to tidy up
Data::Dumper output.  It also avoids a full comparison if the
ResultSourceHandle class by is_deeply in 84serialize.t.

I don't know if there is a clean solution for the Data::Dumper issue,
but given the choice between 5x performance improvement and convenience
of debug output, I'll take the first almost every time.  A suggested
patch against 0.08007 is attached, to that end.

-- 

Jon

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dbic-resultsourcehandle.patch
Type: text/x-patch
Size: 1441 bytes
Desc: not available
Url : http://lists.scsys.co.uk/pipermail/dbix-class/attachments/20071016/33b1ebbe/dbic-resultsourcehandle-0001.bin