[Catalyst] Scalable Catalyst

Wed Jul 1 20:37:17 GMT 2009

On Thu, Jul 2, 2009 at 4:08 AM, Carl
Johnstone<catalyst at fadetoblack.me.uk> wrote:
> I think that the mod_perl mailing list would also be interested in this -
> there are very few people on that list with practical examples of
> multi-thread. As far as I'm aware pre-fork is still pretty much the only
> model recommended.
>

Hmmm. Well I think that many people are still stuck to the paradigms
of mod_perl previous to 2.0 and Apache 2.0.x I didn't try this by mere
chance... after some research we found two interesting references that
led me in this direction:

1) "Scaling Apache 2.x beyond 20,000 concurrent downloads" by Colm MacCárthaigh
<colm at stdlib.net> 21st July 2005. Look at 3.1 "Choosing an MPM". The
results in 2005 were promising, I figured that by 2008 many of these
things were overcome. This paper also helped me in fine-tuning the
Linux VM and led me to try FreeBSD.

2) "Practical mod_perl" O'Reilly Edition May 2003. Look at 24.3
"What's new in mod_perl 2.0", specifically 24.3.1 "Thread Support".
When I read this section, I decided to give this a try, as it was
perfect for our blocking processes problem (i.e. having many
light-weight threads blocking for several seconds each).

> Alejandro Imass wrote:
>> Ok. What would you have done? - not meant as a defensive question but
>> really, we would like to hear options for this application!
>
> I would've probably pushed for a change in the architecture, so that the
> browser makes a request then polls for results. Don't under-estimate the
> ability of users to hammer the F5 button because the page has taken 2
> seconds longer to come back than they expected!
>

There was no change of architecture possible. The other service takes
1 to 7 seconds to respond. We are a B2B server: there is no final user
here. We are just an intermediate server that needs to hold the load
to the slow service.

> However I do find your choice of solution interesting, as you've essentially
> managed to get a fairly out-of-the-box solution working. There are a bunch
> of things that could be done to process this type of workload quicker, but
> with the disadvantage that you've got a bigger custom code-base to maintain.
>

Not at all the case. We chose Catalyst for it's design pattern and
multiple deployment options, plus the fact that we do a lot of work in
Catalyst and in Perl in general. Our own performance was not an issue.
The issue was the blocking calls to the other service.

> I'm curious about the memory differences between pre-fork and threaded in
> mod_perl from your testing. General mod_perl advice is to pre-load as much
> perl code and data as possible and take advantage of the copy-on-write
> aspects of VM. Did you push this? How much difference was there between the
> models?

Oh, of course. We did some profiling to revise and reduce memory
problems, including valgrind and alike and we stripped our Catalyst
app to the bare essentials: For example, our API has HTML form-based
as well as XML capabilities, and FormBuilder was used for form
validations, etc. ; this proved particularly costly so FormBuilder so
it had to be removed, as well as several other plugins for the same
reason.

For your reference, the app was about 30MB in the stand-alone Catalyst
server. Curiously, not very much different in 64Bits and in pre-fork
the average size of each apache process was not very different either
I don't have the exact number w/ me but they were around 40MB each
RSS. With 64 Threads the base Apache process grew to 115MB (170MB VSZ)
and each actual serving process flattened out to 200MB RSS (750 VSZ)
after several thousand hits. So if you have enough CPU power, the RAM
benefits are great. What I have seen happening many times, including
our case is that CPU was either sub-utilized or very busy waiting for
swap disk.

Just FYI: A similar prototype in POE was about 20MB, so no major
savings there! That's when we looked to AnyEvent with EV (from Perl).
This last prototype did prove a viable option, but the TCO was
worrying  and we had already invested and completed the Catalyst app.
If I were to re-write from scratch I would probably revisit the
AnyEvent/EV path.

For an app that has to block thousands of incoming HTTP requests for
several seconds there are really very few options and at some point
you have to dedicate a process or thread (LWP on Linux and alike) to
each one. When we realized this, we started looking for references on
this and just decided to try with threaded mod_perl. The results, as
you can see, were really amazing. Furthermore, the ability to easily
develop the app logic with the stand-alone server (the MVC design
pattern, an ORM like CDBI, etc.) and being able to deal with the
deployment problems without (or little) change in the code was just
phenomenal, proving that Catalyst is a very good choice (if not the
best) for developing any high-end application.

IMHO a lot of the thread paranoia is true before Perl 5.8, mod_perl
2.0 and Apache 2.0.x. Can't say for sure, as I am no expert by any
means, but for us it was just compile, configure and rock-and-roll.

Best,
Alejandro

>
> Carl
>
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>