[Catalyst] Scalable Catalyst

Thu Apr 30 12:31:27 GMT 2009

Alejandro Imass wrote:
> Anyway, the message is that with mod_worker/mod_perl you can spawn
> _thousands_ of threads, getting impressive concurrency (without
> counting the mutex). We have tested Catalyst applications that handle
> _thousands_ of concurrent requests using off the shelf AMD 64Bit HW
> and 12Gb RAM, with a Catalyst app of about 20MB RSS.

There is a big difference between having thousands of requests in-flight 
at once, and serving thousands of new requests a second.

You're saying that mod_worker can do the former well, without mentioning 
the latter.

I'd very much guess that in your configuration, most of your workers 
(and requests) are just pushing bytes to the user, which isn't really a 
hard job.. :_)

The reason that normal mod_perl fails at this is you have one process 
per request, and so having many many requests in flight at once hurts.

However, if you have thousands of requests all trying to generate pages 
at once, you're going to fail and die - full stop...

perl -e'system("perl -e\"while (1) {}\" \&") for (1..1000)'

will convince you of this if you aren't already :)

You can trivially get round this by having a _small_ number of mod_perl 
processes behind a proxy, so that your (expensive/large) mod_perl 
process generates a page, then throws it at network speed (1Gb/s or 
higher if you're on localhost) to the proxy, which then streams it to 
the user much much slower. This frees up your mod_perl processes as 
quickly as possible to be getting on with useful work.

I'd also note that having more threads/processes generating pages than 
you have CPU cores is fairly inefficient, as the more processes you 
have, the greater the penalty you're going to incur due to increased 
context switching overhead. (

Quite often you block on the database in most apps, which means that 1 
process per CPU core doesn't hold totally true for best throughput, so 
YMMV..

For the record, one of my apps can trivially do 200 requests a second, 
with 3000+ concurrent requests in-flight, using a single 4Gb dual core 
x64 box with one disk, running both the application _and_ the mysql server..

It flattens the 100Mb pipe to the internet I have in the office waaay 
before the system actually starts to struggle from a load perspective..

That's nginx / fastcgi with 3 fcgi worker processes (no of cores +1) - 
when benchmarking I found this most efficient for that application.

This is one of the things that your mileage varies significantly, 
depending on what your application is doing, and anyone else's answer is 
going to be lies - you _need_ to test and optimise it yourself for your 
app and your workload. :)

Cheers
t0m