[Catalyst-dev] Paging Ash Berlin

Sat Oct 11 21:03:37 BST 2008

On Sat, Oct 11, 2008 at 12:31 PM, John Goulah <jgoulah at gmail.com> wrote:
> On Fri, Oct 10, 2008 at 6:17 PM, J. Shirley <jshirley at gmail.com> wrote:
>> On Fri, Oct 10, 2008 at 2:22 PM, Ash Berlin <ash_cpan at firemirror.com> wrote:
>>>
>>> On 10 Oct 2008, at 20:46, J. Shirley wrote:
>>>
>>>>
>>>>
>>>> I really really really don't want to start a (very much so offtopic)
>>>> flamewar, but I would like to get a discussion going about this versus
>>>> TheSchwartz.  It seems roughly similar (at least in function).
>>>
>>> TBH one of the reasons I avoided TheSchwartz was that I couldn't work out
>>> what was going on. I did feel kinda iffy about wheel re-invention here, but
>>> there was something about TheSchwartz when i looked at that didn't sit well
>>> with me. Can't remember what it was anymore.
>>>
>>>>
>>>>
>>>> Here are the features that TheSchwartz has that I didn't see in
>>>> MooseX::JobQueue (and yes, please name it something other than
>>>> MooseX::JobQueue)
>>>>
>>>> The following are handled because of Data::ObjectDriver, but want to
>>>> list them as features anyway:
>>>> 1. Partitioning of jobs in the database
>>>> 2. Built-in replication handling
>>>
>>> Not really sure what these two things are? Shouldn't replication be done at
>>> a DB level? Partitioning - as having jobs live in two different tables/DBs?
>>> If so then App::JobQueue (lets call it that for lack of a better
>>> alternative) does that.
>>>
>>
>> Well, I mean horizontal partitioning.  So, automatic partitioning
>> based on some algorithm (like "if job->id % 2 => use this cluster").
>>
>> I didn't realize it did that... couldn't find that bit.
>>
>> As far as replication goes, DBIC handles some replication schemes but
>> there isn't the same support that D::OD has.  I'm not championing
>> D::OD at all here, I prefer DBIC for all things; however D::OD has a
>> lot of code to support multiplexing and caching that DBIC hasn't
>> culled yet.
>>
>> So, while replication happens at the database layer, the interactions
>> there require client side behaviors.  Such as reading from slaves,
>> write to masters, etc.  DBIC already has basic slave/master support
>> but without support for slave read-delay (which is unfortunately
>> application specific in most cases) App::JobQueue won't have that...
>>
>> Which means worse replication support than TheSchwartz.
>
>
> This is correct.  When it was put in production on several servers
> under a replicated DBIC things went a bit haywire with the job locking
> I believe when slaves got delayed and we had to point all queries at
> the master.  Otherwise it does scale beautifully to multiple machines.
>  I wonder what the best solution is here.
>
> John
>

I've spent a great deal of time thinking about it in the past and the
best solution I ever came up with was wrapping it in transactions when
you do a write and need to read the up-to-date information (meaning
that in a transaction, the read source is always the write source,
period.)

It does restrict some flexibility in the application, but I believe
that it is worth it for a few reasons.  Mostly, it keeps the
application structure sane (and also thins controllers naturally).
You can put an intermediate "caching" layer (or, rather, data access)
that gets updated in a single API, so you have better testability.  It
ends up being slightly more code, which is slightly slower, but it
scales near-linearly that way.

In the context of a job queue, the slave needs to access the most
up-to-date information on the job status (to make sure that there
isn't competition) so there will always be a read on the master to
determine the job state.  After that, to query any other information
you could query a slave and disregard any read-delay, since in theory
once the job is assigned to a worker, it shouldn't be written to
except by that worker (or the master that marks the worker as
stalled/dead).

One other problem I ran into with TheSchwartz is that the job
execution time would occasionally hang, triggering jobs that stack on
top of each other.  So, sending a SIG to notify the working child that
their execution time is up would be very nice.  That way it can back
out/stop working, and exit gracefully rather than have two competing
workers on the same resource (just thinking of parallelization cases
for master/slave scaling)

-J