[Dbix-class] Why is this so slow?

"Alex J. G. Burzyński" mailing-dbix at ajgb.net
Tue Jun 8 14:05:17 GMT 2010


Hi,

On 08/06/10 14:09, Chris Cole wrote:
> Given the schema provided at the end, why is this code so slow?
>
> sub getCounts ($) {
>    my $self = shift;
>    my $dataset = shift;
>
>    croak "ERROR - dataset name is required\n" unless ($dataset);
>
>    my $nHits;
>
>    # make SQL search both technical replicates at the same time
>    $dataset =~ s/t\d/t%/;
>    my $rs = $self->resultset('NgsMappings')->search(
>       {
>          'me_sample' => {'like', $dataset},
>       },
>       {
>          columns => [qw/mp_strand mp_ref_id.rs_name mp_start mp_end
> mp_freq/],
>          join => [qw/mp_me_id mp_ref_id/],
>       }
>    );
>
>    while (my $hit = $rs->next()) {
>       my $freq = $hit->get_column('mp_freq');
>       $nHits += $freq;
>    }
>    return($nHits);
> }
>
> The equivalent SQL via DBI runs in ~1.5 min, but the above I killed
> after 30min. The DBI code is:
> my $sth = $dbh->prepare("SELECT mp_strand,rs_name,mp_start,mp_end,mp_freq
>                               FROM ngs_mappings
>                               JOIN ngs_map_exps on mp_me_id = me_id
>                               JOIN ngs_ref_seqs on mp_ref_id = rs_id
>                               WHERE me_sample like ?
>                            ") or die "ERROR - prepare() statement
> failed: ", $dbh->errstr();
> $sth->execute($sample) or die "ERROR - execute() statement failed: ",
> $dbh->errstr;
> while (my $row = $sth->fetchrow_arrayref()) {
>    $nHits += $row->[4];
> }
> $sth->finish();
>
> Any pointers appreciated.
> Cheers,
>
> Chris


As you are creating an object $hit (NgsMappings with it's relationships
- NgsMapExps & NgsRefSeqs) for each row - and if you have a huge number
of rows returned it increases the time.

But if this is real-life code, there's no need to iterate over each row
just to sum up mp_freq, just let your DB to do the job:

SELECT SUM(mp_freq) AS Counts
                              FROM ngs_mappings
                              JOIN ngs_map_exps on mp_me_id = me_id
                              JOIN ngs_ref_seqs on mp_ref_id = rs_id
                              WHERE me_sample like ?


sub getCounts ($) {
   my $self = shift;
   my $dataset = shift;

   croak "ERROR - dataset name is required\n" unless ($dataset);

   my $nHits;

   # make SQL search both technical replicates at the same time
   $dataset =~ s/t\d/t%/;
   my $rs = $self->resultset('NgsMappings')->search(
      {
         'me_sample' => {'like', $dataset},
      },
      {
         select => [{ sum => 'mp_freq' }],
         as => [ 'nHits' ],
         join => [qw/mp_me_id mp_ref_id/],
      }
   );
      return ($rs->first->get_column('nHits'));
}

You can find more information here:

http://search.cpan.org/~frew/DBIx-Class-0.08122/lib/DBIx/Class/Manual/Cookbook.pod#Getting_Columns_Of_Data


Cheers,
Alex




More information about the DBIx-Class mailing list