[Catalyst] Scalable Catalyst

Mon Jun 29 16:28:38 GMT 2009

Hi! Sorry for the lethargy, I've buried in a project and just recently
saw the light of day :-)

Yes, you  are correct [Tomas], BUT it all depends on the type of
application. Web concurrency is often misinterpreted. The application
I was referring to needs the ability to have many, many concurrent
processes waiting for a response from another service which has a long
response time. So in this case, having many, many threads sitting
there waiting for a response is the way to go.

Web concurrency is usually a balance between:

1) The available RAM which limits the number of processes/threads you can load.
2) The time it takes for you to process a given request, and the CPU
power required to do it.

Concurrency tests with HTTPerf and AB will give you a good idea on
requests/per second on a number of given processes/threads and
monitoring CPU usage with top or similar, it's empirical but it works.
This way you can estimate the limits on the actual number of parallel
processes (be it processes or threads) that your machine is actually
able to crank.

Multi-threaded mod_perl (with Apache mod_worker) will only be an
advantage if you actually have the CPU power to process the threads in
parallel. If not, it just becomes sequential on the available CPU time
per thread. On the other hand, the usual case is that the CPU load is
low with respect to RAM usage in the traditional process-only model
(pre-fork), because each process being so large, your RAM fills up
with very few processes, so your not taking full advantage of your CPU
power. By using mod_perl under mod_worker you can use considerably
less RAM and put more actual work on the CPUs, but that comes back to
my original comment at the top: it all dependes on the application.
There are always too many things to consider, such as static content,
file uploads, streaming content and other stuff wich are most surely
managed better outside of your application.

Also, as you state, today's large applications should run behind
reverse proxies/balancers that can also pickup the tab on static
serving and other optimizations.

This is a very interesting diverse and complex subject, but the main
idea of my post was to state that Catalyst works well under
multi-threaded Apache with mod_perl, allowing, _in some cases_ better
usage of the available resources. It does not apply, of course, to all
cases, and your insight explains this very well.

BTW, Ashley suggested I write a how-to on the WIki or something like
that. Could some suggest exactly where, and I may have time to that
this week.

Best,
Alejandro Imass

On Fri, May 1, 2009 at 8:01 AM, Tomas Doran<bobtfish at bobtfish.net> wrote:
> Alejandro Imass wrote:
>>
>> Anyway, the message is that with mod_worker/mod_perl you can spawn
>> _thousands_ of threads, getting impressive concurrency (without
>> counting the mutex). We have tested Catalyst applications that handle
>> _thousands_ of concurrent requests using off the shelf AMD 64Bit HW
>> and 12Gb RAM, with a Catalyst app of about 20MB RSS.
>
> There is a big difference between having thousands of requests in-flight at
> once, and serving thousands of new requests a second.
>
> You're saying that mod_worker can do the former well, without mentioning the
> latter.
>
> I'd very much guess that in your configuration, most of your workers (and
> requests) are just pushing bytes to the user, which isn't really a hard
> job.. :_)
>
> The reason that normal mod_perl fails at this is you have one process per
> request, and so having many many requests in flight at once hurts.
>
> However, if you have thousands of requests all trying to generate pages at
> once, you're going to fail and die - full stop...
>
> perl -e'system("perl -e\"while (1) {}\" \&") for (1..1000)'
>
> will convince you of this if you aren't already :)
>
> You can trivially get round this by having a _small_ number of mod_perl
> processes behind a proxy, so that your (expensive/large) mod_perl process
> generates a page, then throws it at network speed (1Gb/s or higher if you're
> on localhost) to the proxy, which then streams it to the user much much
> slower. This frees up your mod_perl processes as quickly as possible to be
> getting on with useful work.
>
> I'd also note that having more threads/processes generating pages than you
> have CPU cores is fairly inefficient, as the more processes you have, the
> greater the penalty you're going to incur due to increased context switching
> overhead. (
>
> Quite often you block on the database in most apps, which means that 1
> process per CPU core doesn't hold totally true for best throughput, so
> YMMV..
>
> For the record, one of my apps can trivially do 200 requests a second, with
> 3000+ concurrent requests in-flight, using a single 4Gb dual core x64 box
> with one disk, running both the application _and_ the mysql server..
>
> It flattens the 100Mb pipe to the internet I have in the office waaay before
> the system actually starts to struggle from a load perspective..
>
> That's nginx / fastcgi with 3 fcgi worker processes (no of cores +1) - when
> benchmarking I found this most efficient for that application.
>
> This is one of the things that your mileage varies significantly, depending
> on what your application is doing, and anyone else's answer is going to be
> lies - you _need_ to test and optimise it yourself for your app and your
> workload. :)
>
> Cheers
> t0m
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>