[Dbix-class] forking within a single DBIx::Class transaction -- possible?

Wed Oct 14 07:29:06 GMT 2009

Hi Toby,

Thank you, after writing my post yesterday I read further and saw that
multiprocessing and multithreading need new DBI connections for each
process/thread, DBIx::Class is nice in that it automatically notices
new process/thread context and sets $dbh->{InactiveDestroy},
disconnects, and reconnects with new dbh.  As you said, unfortunately
though the requirement to scope everything in one db transaction is
totally not possible no matter what.

I started refactoring the code just as you recommended, doing all of
the intensive processing in parallel and storing the child process
results in a shared memory data structure (Cache::FastMmap for
example) and then only after all children are finished creating a new
DBIx::Class schema and opening a transaction to do the database stuff.

leandro

On Wed, Oct 14, 2009 at 6:59 AM, Toby Corkindale
<toby.corkindale at strategicdata.com.au> wrote:
> Leandro Hermida wrote:
>>
>> Hi everyone,
>>
>> Been a long time since I've posted on this list, but been using
>> DBIx::Class for a couple years now and love it... great software.
>>
>> Anywho, I've wrriten this code which do parallel processing (using
>> Parallel::Forker) within a single DBIx::Class transaction.  Something is not
>> working as it throws lock wait timeout errors.  I want to know, is it
>> possible to use for fork() in general within a single DBIx::Class
>> transaction?  Each of my child processes is working on different data in the
>> database, but I want to rollback everything if something fails in any child.
>
> Hi Leandro,
> I'm afraid it really won't work; the database connection is not designed to
> be arbitrarily multiplexed like that.
> At best it won't work, and at worst you'll get horrible data corruption.
>
> Also, even if it worked, the database performance will drop if you have
> multiple simultaneous queries. (Since the DB has to make sure your queries
> are not interfering with each other.)
>
> So unless your processing is particularly CPU intensive, and you have a
> multi-core system, then I recommend doing it all in a single process.
>
> If your processing *does* meet those requirements, then look into a
> different methodology.. Try retrieving the data, then giving the raw data to
> the your children to process, then take back the results, aggregate it, and
> store it to the DB in the parent.
>
>
> If you're doing the sort of large dataset processing that needs that
> behaviour, then you may want to look into using the Apache Hadoop framework.
>
>
> Cheers,
> Toby
>
> _______________________________________________
> List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
> IRC: irc.perl.org#dbix-class
> SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
> Searchable Archive:
> http://www.grokbase.com/group/dbix-class@lists.scsys.co.uk
>