[Catalyst-commits] r7265 - trunk/examples/CatalystAdvent/root/2007/pen

Tue Dec 11 05:35:53 GMT 2007

Author: jayk
Date: 2007-12-11 05:35:52 +0000 (Tue, 11 Dec 2007)
New Revision: 7265

Added:
   trunk/examples/CatalystAdvent/root/2007/pen/CacheFriendlyApps.pod
Log:
adding cache-friendly-apps article.

Added: trunk/examples/CatalystAdvent/root/2007/pen/CacheFriendlyApps.pod
===================================================================

--- trunk/examples/CatalystAdvent/root/2007/pen/CacheFriendlyApps.pod	                        (rev 0)
+++ trunk/examples/CatalystAdvent/root/2007/pen/CacheFriendlyApps.pod	2007-12-11 05:35:52 UTC (rev 7265)
@@ -0,0 +1,289 @@
+=head1 Making your Catalyst App Cache-friendly
+
+Web caches are very powerful and are used in many places throughout the
+internet. Most cache servers have complex builtin routines designed to guess
+how long it should cache a particular piece of content. This is because many
+web applications provide no information regarding the cache-ability of the
+content they generate. Most of these guessing-routines treat dynamically
+generated pages (anything with a B<?> in the URL) as completely un-cacheable.
+
+This is a tragedy, as wise usage of cache policy can do more for your
+web-application in terms of scalability than any other change you can make in
+your application.  Adding a front-end cache such as Squid to your web server
+can reduce your application-server load significantly while improving
+responsiveness to your users.
+
+I can already hear you saying to yourself: "If it's so powerful, why isn't it
+used more often?" The answer is simple. Most web developers seem to think that
+it's difficult to manage cache limits within a complex web application. In
+many cases they are right. Fortunately for us, Catalyst's many built-in
+features make setting up a fine-grained cache policy very easy.
+
+Today I'm going to give you the quick low-down on cache control and show you
+how to add a fine-grained cache policy to your application.
+
+=head2 How to Control your Cache
+
+The heart of web-cache management is the B<Cache-Control> HTTP header.  It is 
+a very powerful header and has a large number of options which you can read 
+about in the HTTP 1.1 RFC (if you are into that particular form of torture.)
+
+If, on the other hand, you would just like to hit the ground running, I'll show 
+you the two main options you need to know about.
+
+=head3 Don't cache this
+
+Every application is going to have data that is only valid at the time
+requested or for the user making the request. So it is important to know how
+to tell any caches along the way that the data you are sending should not be
+cached at all.
+
+The way you do this is to send the Cache-Control header with the value
+'no-cache':
+
+ Cache-Control: no-cache
+ 
+Doesn't get any simpler than that. Now let's move on to the one we are most
+interested in today...
+
+=head3 How long can this be cached?
+
+There are several ways to specify how long the page you are providing is
+valid. The 'Expires' header is often used to set a date and time whereupon the
+page will no longer be valid. This is a good header to use if you know exactly
+when a page will no longer be valid. In my experience, however, when working
+with web applications you more often want to indicate how long a piece of data
+is good for before it needs to be refreshed. Usually, this is in relation to
+the time the data is requested.
+
+In other words, you usually want to say 'this is valid for the next 10
+minutes, after that you should refresh.' You can still use the 'Expires'
+header for this, but there is a better way.
+
+The 'Cache-Control' header has a directive called 'max-age.' Put simply,
+'max-age' allows you to specify how long in seconds the page can be cached.
+This fits perfectly with what you most often want to do when working with a
+web-app. The max-age countdown always starts at the time the page is
+requested, so you avoid the issues of clock syncronization and drift that you
+can run into when using the 'Expires' header. A Cache-Control header
+specifying a maximum age of 15 minutes would be:
+
+ Cache-Control: max-age=900
+ 
+That's not so difficult, is it?
+
+=head2 Our madness
+
+Now that we know how to inform cache servers of our intentions, how do we
+integrate that into our Catalyst app. More importantly, how do we deal with
+the numerous paths through our application that might generate a page while
+still providing a sensible cache value?
+
+The path I usually take is to have a default maximum time limit for any given
+page in the application. This default does not need to be huge, even 1 hour
+(3600 seconds) is enough to drastically reduce the load on your web
+application (think 1 request for that resource per hour and you start to get
+the picture.)
+
+After you have decided on a default max time, you can reduce that amount as
+appropriate for any given action in your application - setting the news page
+to expire no less frequently than every 10 minutes, for example.
+
+The problem that remains, however, is that in a state of the art Catalyst
+application, very rarely does one action create an entire page. In most cases
+other actions are called in the process of handling the result. This may
+happen either by builtin features such as the Chained dispatch type, or by
+simply calling $c->forward() within your action code. Any one of these actions
+may cause an event which will change the cacheability of the returned page.
+This is both a common and desired practice, so handling of cache times needs
+to account for this.
+
+=head2 The Method
+
+The way we handle this is to simply work with the tools Catalyst provides to
+allow any action to specify it's own constraint on the cacheability of the
+page being generated.
+
+We do this by adding a hash to the stash called 'cachecontrol'. Each action
+can add an element to this hash containing the maximum time in seconds that
+the resultant page can be cached. By doing this, each action is essentially
+stamping a maximum-lifespan onto only the data it generated. This
+maximum-lifespan will apply regardless of the path taken to reach that
+particular section of code. An example of this is below:
+
+ sub newslinks : Private {
+     my ($self, $c) = @_;
+     
+     # Don't allow newslinks to last more than 5 minutes
+     $c->stash->{'cachecontrol'}{'newslinks'} = 300;
+     
+     # do whatever else the newslinks action does
+     # ...
+ }
+
+Marking a path as uncacheable is just as easy:
+
+ sub processlogin : Private {
+    my ($self, $c) = @_;
+
+    # Never cache the result of a login attempt
+    $c->stash->{'cachecontrol'}{'processlogin'} = 0;
+    
+    if ($c->authenticte({ ... })) {
+        # ...
+    }
+ }
+ 
+Setting an element in the cachecontrol hash allows any action to specify it's
+intentions with regard to cacheability. These intentions will be clear
+regardless of how the action was reached. Keep in mind also that only actions
+which affect the cacheability of the page should set a cachecontrol element.
+Other actions need not specify any cachecontrol data at all.
+
+Using this method, by the end of action processing, the cacheability of the
+generated data is clear to us, but we still have not communicated this to the
+requesting browser or cache server. Fortunately for us, Catalyst gives us the
+perfect place to handle this. The Root 'end' action.
+
+All we need to do is place a short routine in our end action that examines all
+the elements of the 'cachecontrol' hash to locate the shortest cache time,
+then we set the Cache-Control header accordingly.
+
+ sub end : ActionClass('RenderView') {
+     my ( $self, $c ) = @_;
+
+     # begin by setting our minumum cache time to our default cache time in seconds.
+     my $cachetime = 3600;
+
+     # check to see if we have an error - we don't want error pages to be cached
+     # so we force our cache-time to 0 in that case.
+     if ( scalar @{ $c->error }) {
+         $cachetime = 0;
+     } else {
+         # Look at each element of cachecontrol to find the shortest
+         # cache time set. 
+         
+         foreach my $section ( keys %{$c->stash->{'cachecontrol'}} ) {
+             my $time = ;
+             
+             # if the currently selected cache-control element is less
+             # than the page's cache-time - we drop the cache-time to 
+             # match the new limit.
+             if ($c->stash->{'cachecontrol'}{$section} < $cachetime) {
+                 $cachetime = $c->stash->{'cachecontrol'}{$section};
+             }
+         }
+     }
+     
+     # at this point - $cachetime should be set to the most restrictive
+     # time set by all of the actions. Now it's time to turn it into 
+     # a header that the cache server / browser can understand.  
+     
+     if ($cachetime == 0) {
+
+         # if $cachetime is 0 - then we can't cache the page and we 
+         # need to tell our requesting server / browser that.
+
+         $c->response->header('Cache-Control' => 'no-cache')
+
+     } else {
+
+         # otherwise we set max-age to the cache-time specified.
+         $c->response->header('Cache-Control' => 'max-age=' . $cachetime);
+     }
+
+ }
+
+Et Viola.  Your application is now cache-friendly.   
+
+=head2 But WAIT! or 'The Gotchas' 
+
+Adding cache times to your application is fine if the entirety of your site is
+public. If your application allows authentication and/or your pages are
+customized on a per-user basis then you have a bit more to think about.
+
+Any time your content is customized on a per-user basis, you are likely
+destroying the cacheability of that content. This means that you need to be
+very careful to set your $c->stash->{'cachecontrol'} appropriately whenever
+you are performing actions that rely on authenticated user. This can be tricky
+as any action may respond differently depending on whether the user is
+authenticated.
+
+The simplest solution to this particular problem is to simply force cachetime
+to 0 if a user is authenticated. This is as simple as adding the following to
+the end routine above:
+
+ sub end : ActionClass('RenderView') {
+     my ( $self, $c ) = @_;
+
+     # begin by setting our minumum cache time to our default cache time in seconds.
+     my $cachetime = 3600;
+     
+     if ( $c->user_exists() ) {
+         $cachetime = 0;
+     }
+
+     ## rest of routine from above
+     ...
+ }
+ 
+There are some remaining issues you need to be aware of when working with
+caching - particularly if you are running cache-servers in front of your web
+application. First, using the above method means that no logged-in users will
+generate pages that stay in your cache. This can be problematic, especially if
+your application caters mainly to registered users. The other problem is that
+when an unregistered user B<DOES> cause a page to be cached, registered users
+will receive that cached copy, even if they are logged in.
+
+To make matters worse, when you are making your site cacheable, you must pay
+attention not only to the code, but to the design and templates used on your
+site also.
+
+=head3 Visual Design, Templates and Cacheability
+
+It is a very common practice these days for web-sites to provide any logged in
+user with a conspicuous reminder that they are logged in. This usually comes
+in the form of a 'Welcome back Joe User!' or other customized content
+somewhere in the page. Many designers place this reminder on every page on the
+site. This presents a difficult problem for anyone wanting to make the site
+cacheable. How do you provide for personalization while still keeping your
+site cacheable?
+
+A few short years ago, this was a very difficult problem to solve indeed.
+Fortunately for us web technology has advanced to the point where it can be
+solved fairly easily. There are two obvious possibilities, depending on the
+complexity of the personalized content.
+
+If the content is simply a customized welcome message, the old standby of
+javascript + cookies provides a fairly neat solution. Set your custom message
+(or even just the User name) in a cookie, and let javascript display the
+custom message if the cookie is present. Your pages remain cacheable, and the
+non-js degraded mode page simply lacks the custom message.
+
+If the custom data is more complex, a 'News items for Joe User!' block, for
+example, the problem again becomes more complex. However, this too can be
+solved fairly easily using AJAX. If you make the custom block load via JS, the
+rest of the page can be cached, and only the customized content will be
+uncacheable. (In fact, with the judicious use of cookies, you can make the
+ajax request contain all the necessary data required to fulfill the request,
+and the response will become cacheable as well.)
+
+=head2 Finally...
+
+Apart from playing nicely with existing cache servers out there on the
+Internet, adding cache control to your site can have a tremendous impact on
+the robustness and scalability of your application. Today we covered how to
+add fine-grained cache control to your application in an extremely painless
+way. When applied well, a front-end caching server can allow your application
+to handle hundreds of times the traffic that it would otherwise be able to.
+
+Solving the technical problem is often only part of the puzzle.  Balancing
+cacheability and personalization can be difficult.  However, if you are conscious of
+the tradeoffs while designing your application most customization can be
+accomplished, while still retaining good performance-increasing cacheability.
+
+Good luck, and happy caching!
+
+=head1 AUTHOR
+
+jayk - Jay Kuri <jayk at cpan.org>