[Dbix-class] Replication Branch, Status and Call for Comments

Thu Jun 19 17:04:54 BST 2008

--- On Sun, 6/8/08, Matt S Trout <dbix-class at trout.me.uk> wrote:

> From: Matt S Trout <dbix-class at trout.me.uk>
> Subject: Re: [Dbix-class] Replication Branch, Status and Call for Comments
> To: "DBIx::Class user and developer list" <dbix-class at lists.scsys.co.uk>
> Date: Sunday, June 8, 2008, 2:04 PM
> On Sat, Jun 07, 2008 at 08:10:16AM -0700, John Napiorkowski
> wrote:
> > Hi,
> > 
> > So this is a call for comments on the outstanding
> replication_redux branch, which is scheduled to be merged
> back to trunk shortly.  This email is a summary of the
> features, differences from the existing version, and any
> issues or questions I am punting to the group.
> 
> No Moose please.
> 
> DBIx::Class uses two space indent, not four.
> 
> reload_row has to go - Storage never, -ever- sees a row
> object because that
> implentation is designed to be pluggable (and there may not
> even -be- a
> row object).
> 
> In any case, if I loaded something from a slave I might not
> want
> discard_changes to go to the master - you need to find some
> other way of
> specifying this. The only use case I can honestly see for
> this is when
> you've -just- inserted the row, in which case the
> find() would happen inside
> the transaction and should go to the master anyway.

I fixed the indenting issue.

I changed the reload_row method.  Now this method is localized to the row object itself.  However, several issues exist for me regarding the DBIC::PK->discard_changes method.  I strongly feel the way this method is implemented is leading to incorrect use.  It seems a lot of people are taking advantage of the fact it actually calls ->find to get whatever the storage has for the matching primary keys in order to do stuff like reloading a row after insert to get any field changes introduced by triggers or similar things.  To me, a method called 'discard_changes' merely means to revert a row state to whatever it was before any changes were introduced.  I wasn't around when this method was first cooked up, so maybe there is some reason for doing it this way, but to me the fact that the current implementation is doing a find is merely a side effect, not the primary function.  If we are going to rely on this behavior that means we could never optimize this
 function to do something like restore the fields from a cached version, etc.

So I feel if we need both behavior types there needs to be two methods.  So this is what I did:

1) In DBIC::Row I created a new method called "get_from_storage" which performs the single task of returning a new row object that matches the current in memory row object using ->find.  

2) I set DBIC:PK->discard_changes to use 'get_from_storage' so that it continues to work as is does now, including the questionable side effect.  This frees us to start having a discussion about the best way to clarify the meaning and effect of these methods.

3) In order to properly support Replication, I make sure the ->find that ->get_from_storage issues is wrapped in a transaction, since Replication forces all queries inside a transaction to go to the Master.  This way for all the current users of ->discard_changes that are expecting it to reliably return the current storage value continue to get what they want.

I hear what you are saying about letting the user choose whether ->discard_changes uses the master or is allowed to query a replicated storage.  Certainly for performance reasons you want to limit the number of times when select queries are forced to the Master storage.  However I really believe that you shouldn't have to write your code so differently in order to get replication working.  Also, I think it's wrong for components like Versioning, Partitioning, etc. to be required to explicitly check if the storage is replicated or not.  Down that path insanity lies.  Stuff you write for a single database would ideally run perfectly when you need   replication.  That's why we overload transactions in replication to force all queries to the master.  If you are using ->discard_changes as a way to 'validate' your Row object against the database, then it is always true that you want the master version, since replicated databases will always be a little behind.
  Same thing goes for the new ->get_from_storage method on the Row object.  So for the short term I think the approach I've written is the way to go.  Because this common use requires it and the other uses merely introduce a performance penalty of more queries to the Master.  If, in the future we can address the issue of how people are using ->discard_changes for the side effect maybe this can change.

For the Moose issue, I have to say this mistake occurred due to a miscommunication between us.  For some reason I thought you gave the go ahead for Moose in this project.  One of my coworkers here at the job also had the same feeling, so I know there must be some good reason I thought that.  However you didn't give that go ahead.  The problem now is that converting this from Moose is not trivial.  Coding in Moose is a very different style, so I'd pretty much have to toss out a lot of code and rewrite (and retest) from scratch.  Since I did this work as part of my paying job, and since we have this running in production right now there is no chance I am going to have the time alloted to do this.  Not only is it my time, but also the time of several others on my team that spent a lot of hours testing the current code.  At this point we've invested dozens of man/hours into writing and testing replication in order to make it stable.  So if you would prefer I
 can remove replicated storage from DBIC and make it into a stand alone set of components.  Then, when we are working on the Moose-DBIC port we can reintegrate.  I'm sure when you start that port I can give some hours to helping.

Then I can remove the Replication stuff from my replication_redux branch, and complete the updates to the test suite so that we can run the full test suite against alternative databases, such as mysql, postgresql, etc.  As I mentioned before, I started working on this because to properly test the replication code I needed to test it against a real replicating database, so I had to fix up the test suite enough to run the few tests related to replication.  Since it was mentioned to me that having the test suite fixed in this way is greatly desirable I am happy to finish that job.  I am about 75% of the way there.  That way also the work I did on this branch would be immediately useful to the community.

Since the last email about replication status I also made quite a few changes to how the replicated storage splits master and replicants, which seems to have fix a serious deadlocking issue we experienced with this in production.  All tests pass and give the same output as trunk.

Sincerely,
John Napiorkowski