[Dbix-class] Soliciting Design Input for an Object Caching Module

Wed May 16 00:15:17 GMT 2007

On 5/12/07, Matt S Trout <dbix-class at trout.me.uk> wrote:
> On Sat, May 12, 2007 at 03:31:12AM +0100, Dave Cardwell wrote:
> > Hello list.
> >
> > I'd like to solicit your thoughts on the appropriate architecture of a
> > DBIx::Class module for caching objects, similar to the functionality
> > provided by Data::ObjectDriver [1].
> >
> > mst has made me aware that there are several existing, private
> > implementations so I would be particularly interested in those
> > developers' input on authoring a solution for general release.
>
> I think we actually want to look at more than one layer of caching :)
>
> A resultset plugin that allows you to share the cached result of a particular
> query across processes would be useful.
>
> The D::OD cache functionality requires two things -
>
> (1) An ability to cache fetches by PK
> (2) An ability to effectively expire caches on change
>
> I think we can probably achieve both by indirecting both result and resultset
> operations via the resultsource object - it's the last thing that really
> understands the PKs, uniques etc. (the storage object doesn't really and
> I think shouldn't). That would then allow the source to fill caches "on the
> way through" - and also to make simple fetches only fetch the primary key
> and then fill results either from cache or via a 'pk IN (...)' second select
> (which is admittedly a gamble that you mostly hit cache but the idea here is
> that you -do- mostly hit cache). Plus when updates occur the resultsource can
> clear the caches (and in some cases update them) appropriately.

While I think caching at the DBIx::Class level will be useful for some
people, I would think that a lot of users would leave it turned off
for consistency reasons.  Even if you're invalidating the cache
locally on update/delete, that's local to one process.  Caches in
other [ithreads, processes, servers, datacenters] won't be
invalidated, and you get inconsistent views of the data.

What would be more interesting to me would be a genericized interface
between DBIx::Class sources and memcached, so that one can just "turn
on" memcached support and give it a few config parameters about where
the memcached servers are, etc.  This solves the cache coherency issue
at the [ithreads/processes/servers] level, and people sharing
databases across remote datacenters of course can't use it or need to
come up with something better (as memcached across a WAN probably
doesn't make much sense in most scenarios).

> The second thing we want to steal from D::OD is the ability to distribute
> fetches across partitioned databases. I'm currently torn as to whether this
> is better happening at the source or storage level - I -think- we probably
> want to put this logic in the resultsource as well, since the choice of
> partition is linked tightly into a level of data definition that again the
> storage doesn't need to know about.
>
> My thought would be to have a composite source that talks to multiple
> underlying source objects, one per partition, and for those to refer back to
> a partition schema object with an appropriate storage object.

I think that sounds like a sane plan.  I guess partitioned data needs
to not have relationships, or needs to keep relationships local to the
partition (like, perhaps you have no inter-user relationships, and the
data of all other tables has an FK to the user, so you partition on
username).  I think it would be extremely difficult for us to try to
emulate joins across partitions.

-- Brandon