[Catalyst-commits] r7307 -
trunk/examples/CatalystAdvent/root/2007/pen
mblythe at dev.catalyst.perl.org
mblythe at dev.catalyst.perl.org
Mon Dec 17 16:34:43 GMT 2007
Author: mblythe
Date: 2007-12-17 16:34:42 +0000 (Mon, 17 Dec 2007)
New Revision: 7307
Modified:
trunk/examples/CatalystAdvent/root/2007/pen/20.pod
Log:
advent day 20: now with actual words!
Modified: trunk/examples/CatalystAdvent/root/2007/pen/20.pod
===================================================================
--- trunk/examples/CatalystAdvent/root/2007/pen/20.pod 2007-12-17 01:11:29 UTC (rev 7306)
+++ trunk/examples/CatalystAdvent/root/2007/pen/20.pod 2007-12-17 16:34:42 UTC (rev 7307)
@@ -1,5 +1,182 @@
=head1 Handling Growing FastCGI Processes
+You've finished writing your Catalyst application! You tested it thoroughly
+yourself, had some others help you with QA, and even wrote some unit tests.
+You just launched it into production, and you couldn't be more proud.
+
+But wait... after a few hours of production level traffic, something looks
+wrong. Those FCGI processes that started out around 40MB just keep
+getting bigger. Everything was fine at first, but then they grew to 100MB
+each. Now they're over 200MB each and have thrown the machine into swap!
+Help! What to do?
+
+If you're running on a Linux system, this article will give you some
+answers. If you're running on a different UNIX-like system, much of it
+should be easily adaptable. If you're on Windows... well, at least some
+of the concepts should apply. I think.
+
+=head2 Buy Some Time
+
+First, buy yourself some time. Since the processes start out small and
+grow over time, you can temporarily calm the chaos by restarting all of
+your FCGI child processes. Sound drastic? It depends on how desperate
+your sitation already is. If your processes are growing without bound,
+your server(s) are thrashing in swap, and your web site response rate
+is grinding to a halt, you need some immediate relief. A simple restart
+will give you that.
+
+Assuming you are running the standard C<myapp_fastcgi.pl>, you can gracefully
+restart all of the child processes by sending a HUP signal to the manager
+(parent) process. Hopefully you started your app with the C<-p> switch
+and have a pidfile handy. If so, all you'll need to do is this:
+
+ kill -HUP `cat myapp.pid`
+
+Now check your process list again. Things should be back at the nice,
+sane, small size for another few hours.
+
+=head2 The Root of the Evil
+
+So what is causing the memory bloat? Do you have a memory leak? While
+it's certainly possible, it could be something far less sinister. Before
+you spend the next several days chipping away at your sanity while hunting
+for leaks, you should consider the possibility that you are seeing normal
+B<copy-on-write> behavior. There is an excellent description of this effect
+in the book B<Practical mod_perl>, and you can read the relevant portions
+online:
+
+L<http://modperlbook.org/html/10-1-Sharing-Memory.html>
+
+In a nutshell, when the parent FCGI process forks a child process,
+much of the memory is shared. As the child process continues to run, it
+writes its own data into memory, which causes more and more of that shared
+memory to become unshared. Over time, this really starts to add up.
+
+Of course, you could actually have a memory leak as well. The root cause
+could be a combination of copy-on-write and leaky plumbing. Luckily, the
+solution for copy-on-write management happens to also buy you time for
+hunting down a memory leak.
+
+=head2 Process Management
+
+Those of you with C<mod_perl> experience may wonder where the FCGI
+equivalent to C<MaxRequestsPerChild> is. This configuration directive sets
+a maximum number of requests that each Apache instance should handle before
+it dies, releasing its memory, and allowing a fresh new Apache instance to
+be spawned in its place. Using this setting is a common and recommended
+practice when running mod_perl applications. It would certainly solve
+our FCGI problem, if we only had such a setting.
+
+You could simply set an upper memory limit with C<ulimit> or the C<daemontools>
+C<softlimit> command. That way, if one of the children reached the
+predetermined upper limit, it would die; the FCGI manager would spawn a new
+child; all would be well. There is only one problem with this approach.
+
+When the child process hits the upper memory limit, it dies a horrible death.
+The unfortunate web user that process was serving will be cut off in
+mid-request, left with a broken page, and even worse -- a half-finished
+action on the backend. At best, this means a bad user experience. At worst,
+if you aren't using a fully transactional database system, it could mean
+data corruption.
+
+Faced with this dilemma and growing tired of restarting my application
+by hand every few hours, I reached for the tools I knew. I wrote my own
+process reaping cron that would gracefully kill any child processes that
+had grown too large. Thankfully, L<FCGI::ProcManager> gives us a relatively
+easy way to do this. If you send a HUP signal to an individual child FCGI
+process, it will finish handling its current request and then end. The
+FCGI manager will then spawn a new child to replace it. That is exactly
+what we want.
+
+=head2 The Script
+
+This script will check all of the child FCGI processes to see if any have
+grown beyond a specified memory size. If they have, it will send them a
+HUP signal, asking them to end after finishing their current request.
+
+I currently run this every 30 minutes via cron, just to be safe. You never
+know exactly when a process may cross that line:
+
+ */30 * * * * /path/to/kidreaper `cat /tmp/myapp.pid` 104857600
+
+The script is fairly simple and relies on a C<ps> command that supports the
+C<--ppid> switch:
+
+ #!/usr/bin/perl
+
+ use strict;
+
+ my ($ppid, $bytes) = @ARGV;
+
+ die "Usage: kidreaper PPID ram_limit_in_bytes\n" unless $ppid && $bytes;
+
+ my $kids;
+
+ if (open($kids, "/bin/ps -o pid=,vsz= --ppid $ppid|")) {
+ my @goners;
+
+ while (<$kids>) {
+ chomp;
+ my ($pid, $mem) = split;
+
+ # ps shows KB. we want bytes.
+ $mem *= 1024;
+
+ if ($mem >= $bytes) {
+ push @goners, $pid;
+ }
+ }
+
+ close($kids);
+
+ if (@goners) {
+ # kill them slowly, so that all connection serving
+ # children don't suddenly die at once.
+
+ foreach my $victim (@goners) {
+ kill 'HUP', $victim;
+ sleep 10;
+ }
+ }
+ }
+ else {
+ die "Can't get process list: $!\n";
+ }
+
+No, the script is not bullet-proof. It could stand to have better error checking,
+improved option handling, etc. It's a functional starting point, though.
+
+=head2 A Better Solution?
+
+Is there a better solution? Probably. I've thought of writing a L<FCGI::ProcManager>
+subclass that has child memory limit support built in. Maybe someday I will. For now,
+this C<kidreaper> cron is working nicely enough that I don't urgently need to pursue
+another solution. I get to focus on other needs, which I assume you'd like to do as well.
+
+=head2 Caveats
+
+If you find that sending a HUP signal to a FCGI child actually causes the whole manager
+process tree to die, you may have run into the same strange situation that I did on
+one of my boxes. It happened on B<only> one of my boxes, which of course had to be a
+production machine. I finally stopped chasing down the root cause when I got into the
+L<POSIX> module and decided it was something to do either with my Linux kernel or Perl
+version. In any case, the bad behavior only happened when I used the C<-d> switch
+of the C<myapp_fastcgi.pl> script to B<daemonize> the process. On a tip from another
+Catalyst user, I switched to C<daemontools> for managing my application startup, since
+it does not require the daemonize switch. Presto! That problem was solved.
+
+=head1 SEE ALSO
+
+=over
+
+=item L<http://modperlbook.org/html/10-1-Sharing-Memory.html>
+
+=item L<http://cr.yp.to/daemontools.html>
+
+=item L<FCGI::ProcManager>
+
+=back
+
=head1 AUTHOR
-Mark Blythe
+Mark Blythe <perl at markblythe.com>
More information about the Catalyst-commits
mailing list