[Catalyst] C::P::PageCache patch for reducing duplicate processing

Mon Jun 26 08:14:27 CEST 2006

Wade.Stuart at fallon.com ha scritto:
> 
> 
> 
> 
> 
> catalyst-bounces at lists.rawmode.org wrote on 06/23/2006 11:04:07 AM:
> 
>> Wade.Stuart at fallon.com ha scritto:
>>>
>>>
>>>
>>>> Wade.Stuart at fallon.com ha scritto:
>>>>>
>>>>>
>>>>>> Perrin Harkins wrote:
>>>>>>> Toby Corkindale wrote:
>>>>>>>> One of the aims of my patch is to avoid the expense of having
>>> numerous
>>>>>>>> processes produce the same code simultaneously. So on the initial
>>>>> bunch
>>>>>>>> of requests, I'll still try and have the code delay all-but-one of
>>>>> them
>>>>>>>> from building the page.
>>>>>>> Presumably this only happens when a new cached page suddenly
> becomes
>>>>>>> available and is instantly in high demand.  It's not very frequent.
>>> In
>>>>>>> my opinion, that isn't worth the danger of messing with something
> as
>>>>>>> potentially troublesome as locks that block page generation, but I
>>>>>>> suppose no one is forced to use the locking.
>>>>>> Good point.
>>>>>> I'll try and implement the features so they can be enabled
> separately.
>>>>> I will second the "I don't think it is worth it" case.  99% of the
> time
>>>>> caching is set at startup and the only time the case you are coding
> for
>>> is
>>>>> hit is on the first page load if the second request comes in for the
>>> same
>>>>> page before the page build is done from the first hit.  Seems like
> such
>>> an
>>>>> outside case that I would be against all that extra locking an
> special
>>> case
>>>>> code even if it is an option.
>>>> Could this condition be triggered by the user hitting "Reload" or "Go"
>>>> many times while waiting for the page ?
>>> Yes,  my case statement was general on purpose.  If a user or multiple
>>> users make multiple requests for the page, and the requests are the
> first
>>> ones that happen after the server is started,  multiple builds would
> happen
>> not only when the server is (re)started, but also when the cached page
>> expires
> 
> No cache expire/rebuild is covered by the algorithm I submitted earlier --
> only one rebuild happens during which other requests serve the old copy.
> After the one rebuild is done, the new copy is served for all requests
> until next expire period starts the process over again.  The scope of the
> potential problem is truly limited to:
> 
> Multiple hits, in which the timing of those hits is quicker than the
> initial build of the page, to a page that has not been hit and cached since
> the server was restarted.

This lowers the need for code that takes care of the special case IHMO 
(production services are not supposed to be restarted very often).

> 
>>> until the first build is done and stored in cache.  After the cache is
>>> populated there is no window of opportunity for the case to exist
> (given my
>>> no blocking method is implemented).  That seems to me like very minimal
>>> exposure and acceptable "startup cost" for a site.
>> I agree.
>>
>>> Throwing in locking and all that baggage to avoid the outside case (or
> all
>>> the logic to allow the option of the locking) just seems to go against
>>> KISS.
>>>
>>>
>> I agree on this point too.
>>
>> I also think that the problem at hand could show up only on the most
>> requested pages of a heavy-traffic site.
> 
> True, and only after a service restart.
> 
> 
>> Also, it would have a significant impact only on undersized hardware
>> IMHO, because it would cause a spike in memory and cpu utilization, that
>> would cease as soon as the first copy of the page is produced, as you
> said.
>> In other words, I think some benchmarks would be required to know how
>> much a problem this... problem really is.
> 
> I will not argue with benchmarks results,  you can always just logically
> deduce the maximum vector too -- in this case it is small.
> 
> In addition to these changes,  I may also start looking into two more cache
> types for Cat too.
> 
> 1:  A session based cache so that heavy user customized pages can be cached
> uniquely per session -- this would allow apps that have session specific
> pages to allow caching (which is not really possible with PageCache right
> now as I read the code.  I see this as a big win because currently you need
> to manage heavy (meaning expensive) customized pages to lower their build
> cost cause there is no cache available -- with cache you can direct to only
> build one time per 15 seconds or whatever and have a heavier page.

Seems clever to me.

> 
> 2: A pagecache daemon system that rebuilds expired cache in queue behind
> the scenes inserting into the cache.  This would prevent the blocking on
> the hit that comes after the expire period while the long building page is
> build.
> 

Just issue a request for an expired page so that when the first client 
(real) request for that page comes in, the page has already been cached.
(just a thought :-)

> 
> _______________________________________________
> List: Catalyst at lists.rawmode.org
> Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.rawmode.org/
> Dev site: http://dev.catalyst.perl.org/
> 
> 

-- 
Marcello Romani
Responsabile IT
Ottotecnica s.r.l.
http://www.ottotecnica.com