[Dbix-class] "fork- and thread-safe"

Fri Jun 23 22:11:58 CEST 2006

On 6/23/06, Mark Hedges <hedges at ucsd.edu> wrote:
>
> I tried to explain on the Mason list that DBIx::Class is "fork-
> and thread-safe" in a discussion on how (why not) to cache a
> DBI connection at Apache start-up.
>
> It occurred to me I don't really know what this means, and I
> couldn't find in previous discussions or the DBIC::Storage::DBI
> man page what actually happens.
>

Its a rather complicated can of worms is what it is :)

To begin with, fork/thread-safety goes beyond just apache + $worker
issues and stages of the startup of such an app.  I for instance have
commandline utilities and long-running system daemons I've written
that use DBIx::Class and fork themselves whenever they feel its
convenient.  Because of the DBIx::Class support for this, my
DBIC-related code doesn't need to know anything about that, or do
anything about.  $schema just keeps working as expected after the fork
for both child and parent, even if one or the other exits.

The basic underlying issue is that if you get a $dbh via DBI->connect,
then fork off a child, the parent and child share the connection, and
they don't play nice with each other.  DBI documents that this is
unsafe, and in practice you can see warnings and errors from this (its
a race thing, so sometimes you can get away with it for a little
while, but it will bite you eventually).  DBI makes no effort to
detect the situation or do anything about it, things just break.

Threads will also by default share a connection if you let them share
a $dbh, but instead of seemingly-working and then randomly failing
later, DBI will throw exceptions as soon as you try to touch the $dbh
from the wrong thread, and likely terminate your app/worker/whatever.

So, from the point of view of the application or module author who is
using DBI directly, one has to be careful that every time one forks
(or spawns a thread), that one obtains a fresh new database handle for
the new process / thread to avoid problems.  You also have to be
careful what you do with the old one, as the $dbh destructor will
close the physical (by that I mean socket) connection.  So if you fork
off a child, and the child does "undef $dbh", this will kill the
parent's connection too by default.  The $dbh attribute
"InactiveDestroy" is used to work around that particular issue.

For a generically-useful ORM like DBIx::Class, the larger issue is
that since we are not the application, we can't really know when or if
the app author is going to fork or thread.  I suppose if just *before*
any forking or threading operation the app author did a
$schema->storage->disconnect, that would solve the issue right there.
But they often won't, or don't know where to, or potentially don't
even have direct control of the forking/threading code (as is the case
with apache worker modules).

So the most robust answer was that we built support in DBIx::Class to
automatically detect that the process or thread context has changed
and take appropriate measures as neccesary to use DBI safely and
correctly, which frees the user from ever having to worry about all
this crap.  You just use it, and it just works, and you can keep your
$schema across forks and threads just fine.

> If I connect in the startup.pl, does that mean that each forked
> child shares the connection?  (I'm guessing no.)
>

With straight DBI, yes, and that breaks things.  With DBIx::Class, the
first time each child tries to use their connection, they will first
detect that the PID has changed, then set InactiveDestroy, undef their
$dbh, and reconnect.  This is transparent to the user of DBIx::Class.

> If I connect in startup.pl under an Apache2 threaded worker
> model, does that mean each mod_perl thread shares the
> connection?
>

Same answer as above - with straight DBI if you connect in startup.pl
you will have issues, but with DBIx::Class you can connect in
startup.pl and everything works fine.  Each new process and/or thread
gets its own connection (but won't actually make that connection until
it tries to access the database and "notices" that the pid/thread has
changed out from under it).

> How does this actually work under FastCGI?

It all depends on what FastCGI environment you're in and how you're
using it, but I think normally its a non-issue for FastCGI as the
workers are all seperate procs to begin with (spawned by the FCGI proc
manager).

> Is there any way for multiple processes or threads to really
> share a DBIC connection?

If by that you mean multiple procs/threads to share a DBI connection
via DBIC, not at the moment, but in theory yes.  There are modules out
there on CPAN that go about this by multiplexing the requests of many
procs/threads into a single "db worker" proc/thread which handles
everything via single connection (or a pool of connections, but the
important thins is n_threads > n_conns).  They use locking to make
sure only one proc/thread can really access a connection at a time.

We could make a Storage::DBI subclass that works similarly.  Note that
there are issues wrt to transactions and other potentially
(accidentally) shared state between the multiplexed processes that
must be dealt with one way or another.

Personally, I don't think its much worth it in most normalish
real-world scenarios.  On reasonable platforms idle connections really
don't cost much at all.  Distributing the same txn load over 5 or 50
connections shouldn't really change much for most people - some
perhaps, but enough to be worth the added complexity?

But if someone finds themselves in a situation where this would be
beneficial, feel free to write the support for it, or bug someone else
to, or sponsor someone else to, etc :)

-- Brandon