[Dbix-class] Replicated Storage branch ready for review/discussion

Tue May 13 00:21:01 BST 2008

Hey,

The 0.08 replication_redux branch has stabilized and is ready for review and comments.  This branch is basically a non compatible rewrite of the original DBIC::Storage::DBI::Replication storage class so I want to document the major changes and reasons for doing so.  Also, while working on this branch I fixed a bunch of non replication related issues, and wanted let people know about that.

DBIC::Storage::DBI::Replicated is an alternative storage engine that:

  - Splits read/write queries over two different storages while delegating to both storages when necessary (primarily during instantiation, so that all storages connect properly).

  - Defines a pool mechanism to hold a list of storages, set individual storages as active or not, and to validate the status and lag time of storages that are slaves in a replicated environment.

  - Defines a Balancer storage that can use various strategies to spread query load across a Pool.  It also defines a mechanism to automatically validate the Pool every certain amount of seconds.

  - Defines a Replicant storage type that adds some functionality that is specific to storages, such as an attribute for maintaining if the storage is active or not, and some additional debug output so that you can see which storage is handling the request.

The basic purpose of these classes is to support the common 'master/slaves' replication environment', where all data changing queries should be routed to the master, while all read queries balanced over a pool of slaves.  This is a very common style of database scaling, so having good support for this in DBIC would be very valuable, particularly to companies that don't have the capital for hardware based balancers and attending monitoring software.

I chose to break this out from blackbox style balancers (like DBD:Multi) because I was having some driver specific issues that we've already solved with our list of database specific storages, such as DBIC::Storage::DBI:mysql, and because this gives use more fine tuned control over the storages.  For example, this system makes it easy to query information about a particular storage in the pool.  Also it should be easy to write a custom balancer, such as a round robin style balancer, or even a least connected balancer.  In general I think it makes sense to integrate this into DBIC.

The test for this is t/93storage_replication.t, which defines a sqlite compatible test (using copy to fake replication) but allows you to override the master and replicant connect info so that you can test it on your own replicated environment.  I tested it against mysql native replication.

Places that probably could stand more abstraction would be the system for splitting read/write queries, which is currently integrated into Replicated.pm and the timer that the balancer uses to track when to validate the pool of slaves.  I actually did work on some separate query counter/timer event code, and will likely cut a branch shortly for it, so if anyone else is interested or could use something like that, please speak up.

Changes made to core DBIC classes include:

DBIC::Schema:

- changed the storage_type class accessor so that it can accept a hashref in addition to a string, in order to support storages that require args.

 Example:

$schema->storage_type({'Replicated' => \%options});

DBIC::Storage::DBI:

- added two virtual methods, 'is_replicating', 'lag_behind_master', to support the replication pool validation feature.  Added support for these methods in the mysql specific storage.

I realize in a lot of ways it's not ideal for these methods to be part of the base DBI storage, since not all storages will be replicating.  Suggestions on a better way to abstract this functionality would be welcomed.

Additional I made several small changes to the test suite to fix bugs I discovered when trying to run the core tests against my mysql replicating setup.  There were a couple of bad FK constraints that died on mysql, and I changed a test for a self-referential table so that it worked when the db actually enforced the constraints.  I wasn't able to get the entire test  suite to run against mysql because there are a large number of tests that assume sqlite, but now if you write a new test you could target other databases and check that by overriding the DBICTEST_DSN/DBUSER/DBPASS environment variables.  I really recommend authors do this in the future, since it can only make our test coverage even better.

Along with this is a change I make to SQLT::Parser::DBIx::Class to better support mysql FK constraints that involve several columns in the constraint.  However this will require a patch to SQLT::Producer::Mysql as well, to be fully fixed.  If anyone cares, talk to me about it.

I think the code is pretty clean and well documented (I even pod the test case) but please point out any trouble or areas of confusion.

Thanks for people advice and thoughts on this so far.
John Napiorkowski

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ