[Dbix-class] DBIx::Class::Fixtures suggestions/patches

Mon May 19 10:11:38 BST 2008

(warning long post)

I have written a Fixtures wrapper for my particular workflow for use
at $work, along with some customizations to D::C::Fixtures and wanted
to get opinions about pushing them to svn. To get a big performance
boost for population I'm now using the following procedure when
creating a dump:

1) Dump from source database using Fixtures
2) Populate dump into fixtures database
3) Run pg_dump from fixtures database into separate dump file
4) Delete fixtures dump dir (actually using File::Temp::tempdir)

sub dump
{
    my $self = shift;

    my $params = {...};
    $self->fixtures()->dump($params);

    # populate the data in to the fixtures db
    $self->fixtures()->populate({
        directory => $data_dir,
        ddl       => 'name.sql'),
        post_ddl  => 'name_post.sql'),
        cascade   => 1,
        connection_details => [...],
    });

    my $dump_file = $self->_full_dump_file();
    unlink $dump_file if -e $dump_file; # pg_dump won't overwrite existing file
    system("pg_dump ...");
}

And population goes like:

1) clear fixtures database
2) import pg_dump file
3) run post_ddl bits

sub populate # (my wrapper)
{
    my $self = shift;

    # clear existing data
    my $schema = $self->fixtures()->_generate_schema({
        ddl       => 'name.sql',
        cascade   => 1,
        connection_details => [...'],
    });

    # import dump data
    $self->import_pg_dump_file($self->_full_dump_file());

    # add constraints/fk
    $self->fixtures()->do_post_ddl({
        schema => $schema, post_ddl => 'name_post.sql'),
    });
}

When I populate I just read in the pg_dump file, resulting in test
runs in 1/4 of the time (12-15 seconds to populate & run a test file
vs 45 seconds doing populate via DBIx::Class::Fixtures::populate().
This is not a criticism of Fixtures, because I love the ability to
easily specify what database tables/rows to dump. I was wondering
about creating an alternate set of dump/populate methods which would
encapsulate this workflow, but be db agnostic. It dramatically sped up
my tests, so much so that I'm likely to run them more often now. :-)


Some other changes I've made locally:

1. Move the post_ddl bits into a separate method. I then call this
method from my wrapper. Diff against latest 1.001/trunk:

--- Fixtures.pm	2008-05-16 14:50:12.000000000 +0100
+++ Fixtures.pm.proposed	2008-05-19 09:55:42.000000000 +0100
@@ -853,14 +853,7 @@
     $rs->populate(\@rows);
   }

-  if ($params->{post_ddl}) {
-    my $data = _read_sql($params->{post_ddl});
-    foreach (@$data) {
-      eval { $schema->storage->dbh->do($_) or warn "SQL was:\n $_"};
-  	  if ($@) { die "SQL was:\n $_\n$@"; }
-    }
-    $self->msg("- finished importing post-populate DDL into DB");
-  }
+  $self->do_post_ddl({schema=>$schema,
post_ddl=>$params->{post_ddl}}) if $params->{post_ddl};

   $self->msg("- fixtures imported");
   $self->msg("- cleaning up");
@@ -870,6 +863,18 @@
   return 1;
 }

+sub do_post_ddl {
+  my ($self, $params) = @_;
+
+  my $schema = $params->{schema};
+  my $data = _read_sql($params->{post_ddl});
+  foreach (@$data) {
+    eval { $schema->storage->dbh->do($_) or warn "SQL was:\n $_"};
+	  if ($@) { die "SQL was:\n $_\n$@"; }
+  }
+  $self->msg("- finished importing post-populate DDL into DB");
+}
+
 sub msg {
   my $self = shift;
   my $subject = shift || return;

2. Support for "grouping" in ddl files. I ran into a major problem
where some of my ddl files had stored procedures with semicolons. The
original _read_sql(), while clever and simple, would break horribly
with the embedded semicolons in my procedures. I added (crude) support
for grouping to avoid this problem. The DDL has comments which trigger
the feature.

--GROUPBEGIN
CREATE OR REPLACE FUNCTION booking_last_modified_lcupdate() RETURNS "trigger"
    AS $$ begin NEW.last_modified = now(); return NEW; end;  $$
    LANGUAGE plpgsql;
--GROUPEND

The diff (also against 1.001/trunk)

--- Fixtures.pm	2008-05-16 14:50:12.000000000 +0100
+++ Fixtures.pm.proposed	2008-05-19 10:02:34.000000000 +0100
@@ -726,11 +726,43 @@
   my $fh;
   open $fh, "<$ddl_file" or die ("Can't open DDL file, $ddl_file ($!)");
   my @data = split(/\n/, join('', <$fh>));
-  @data = grep(!/^--/, @data);
-  @data = split(/;/, join('', @data));
   close($fh);
-  @data = grep { $_ && $_ !~ /^-- / } @data;
-  return \@data;
+
+  my @commands;
+  my $group_started = 0;
+  my $buffer = '';
+  foreach my $line (@data)
+  {
+    chomp $line;
+    my ($group) = $line =~ /--GROUP(BEGIN|END)/;
+    $group ||= '';
+    if ($group) {
+      $group_started =  $group eq 'BEGIN' ? 1 : 0;
+    }
+
+    $line =~ s/--.*//; # remove comments
+
+    # when closing a group, flush on the "END" command
+    if ($group_started || $group) {
+      $buffer .= $line;
+      if ($group eq 'END') {
+        push @commands, $buffer;
+        $buffer = '';
+      }
+    }
+    else {
+      while ($line && $line =~ s/(.*;?)//) {
+        my $cmd = $1;
+        $buffer .= $cmd;
+        # warn "  [$cmd] [$line] [$buffer]\n";
+        if ($cmd =~ /;/) {
+          push @commands, $buffer;
+          $buffer = '';
+        }
+      }
+    }
+  }
+  return \@commands;
 }

 =head2 populate

Thoughts? (and thanks for reading this far)
-- 
----------------------------------------------------------------
 Drew Taylor                 *  Web development & consulting
 Email: drew at drewtaylor.com  *  Site implementation & hosting
 Web  : www.drewtaylor.com   *  perl/mod_perl/DBI/mysql/postgres
 ----------------------------------------------------------------