[Catalyst] C::P::PageCache patch for reducing duplicate processing

Fri Jun 23 21:10:26 CEST 2006

catalyst-bounces at lists.rawmode.org wrote on 06/23/2006 11:04:07 AM:

> Wade.Stuart at fallon.com ha scritto:
> >
> >
> >
> >
> >> Wade.Stuart at fallon.com ha scritto:
> >>>
> >>>
> >>>
> >>>> Perrin Harkins wrote:
> >>>>> Toby Corkindale wrote:
> >>>>>> One of the aims of my patch is to avoid the expense of having
> > numerous
> >>>>>> processes produce the same code simultaneously. So on the initial
> >>> bunch
> >>>>>> of requests, I'll still try and have the code delay all-but-one of
> >>> them
> >>>>>> from building the page.
> >>>>> Presumably this only happens when a new cached page suddenly
becomes
> >>>>> available and is instantly in high demand.  It's not very frequent.
> > In
> >>>>> my opinion, that isn't worth the danger of messing with something
as
> >>>>> potentially troublesome as locks that block page generation, but I
> >>>>> suppose no one is forced to use the locking.
> >>>> Good point.
> >>>> I'll try and implement the features so they can be enabled
separately.
> >>> I will second the "I don't think it is worth it" case.  99% of the
time
> >>> caching is set at startup and the only time the case you are coding
for
> > is
> >>> hit is on the first page load if the second request comes in for the
> > same
> >>> page before the page build is done from the first hit.  Seems like
such
> > an
> >>> outside case that I would be against all that extra locking an
special
> > case
> >>> code even if it is an option.
> >> Could this condition be triggered by the user hitting "Reload" or "Go"
> >> many times while waiting for the page ?
> >
> > Yes,  my case statement was general on purpose.  If a user or multiple
> > users make multiple requests for the page, and the requests are the
first
> > ones that happen after the server is started,  multiple builds would
happen
>
> not only when the server is (re)started, but also when the cached page
> expires

No cache expire/rebuild is covered by the algorithm I submitted earlier --
only one rebuild happens during which other requests serve the old copy.
After the one rebuild is done, the new copy is served for all requests
until next expire period starts the process over again.  The scope of the
potential problem is truly limited to:

Multiple hits, in which the timing of those hits is quicker than the
initial build of the page, to a page that has not been hit and cached since
the server was restarted.

>
> > until the first build is done and stored in cache.  After the cache is
> > populated there is no window of opportunity for the case to exist
(given my
> > no blocking method is implemented).  That seems to me like very minimal
> > exposure and acceptable "startup cost" for a site.
>
> I agree.
>
> >
> > Throwing in locking and all that baggage to avoid the outside case (or
all
> > the logic to allow the option of the locking) just seems to go against
> > KISS.
> >
> >
>
> I agree on this point too.
>
> I also think that the problem at hand could show up only on the most
> requested pages of a heavy-traffic site.

True, and only after a service restart.

> Also, it would have a significant impact only on undersized hardware
> IMHO, because it would cause a spike in memory and cpu utilization, that
> would cease as soon as the first copy of the page is produced, as you
said.
>
> In other words, I think some benchmarks would be required to know how
> much a problem this... problem really is.

I will not argue with benchmarks results,  you can always just logically
deduce the maximum vector too -- in this case it is small.

In addition to these changes,  I may also start looking into two more cache
types for Cat too.

1:  A session based cache so that heavy user customized pages can be cached
uniquely per session -- this would allow apps that have session specific
pages to allow caching (which is not really possible with PageCache right
now as I read the code.  I see this as a big win because currently you need
to manage heavy (meaning expensive) customized pages to lower their build
cost cause there is no cache available -- with cache you can direct to only
build one time per 15 seconds or whatever and have a heavier page.

2: A pagecache daemon system that rebuilds expired cache in queue behind
the scenes inserting into the cache.  This would prevent the blocking on
the hit that comes after the expire period while the long building page is
build.