[Catalyst] Zeus and Catalyst

Tomas Doran bobtfish at bobtfish.net
Tue Jan 11 11:16:19 GMT 2011


On 10 Jan 2011, at 22:05, Duncan Garland wrote:
> Are there any known problems using two serving running Zeus with a  
> load balancer, fastcgi, Catalyst and Oracle 10.2 on Red Hat?

None known :)

> Our servers lock up occasionally. Half a dozen times a day, which is  
> often enough to cause real embarassment. We restart them by killing  
> the fastcgi process.
>
> The symptom is that something causes Catalyst to start returning  
> zero length pages.
>
> eg
>
> 77.242.199.1 - - [10/Jan/2011:20:22:22 +0000] "GET /fcgi/catalyst/ 
> xxxx/script/xxxx_fastcgi.pl/javascript/calculation_accept HTTP/1.1"  
> 200 0 "http://www.xxxx.co.uk/fcgi/catalyst/xxxx/script/xxxx_fastcgi.pl/home 
> " "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.13)  
> Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729)"
>
> Half a dozen calls return this, then the server locks up completely  
> until the process is killed.
>

Can you define 'locks up completely' here - is it spinning at 100% CPU?

What does strace say the fcgi process is doing when it's locked up?  
Also, what does gdb say the stack is?

Is this fcgi on a unix domain socket, or a tcp socket?

> It doesn't happen on our other servers which don't have load  
> balancers. One of the first things we did was to force the load  
> balancer to always use the same server. That didn't help.

What is the balancer doing for it's health checks? Maybe this is what  
is making it sick for some reason? (I've seen cases (non-Catalyst)  
where load balancers making half requests / slightly mad requests /  
closing the socket before getting a response / other mad things have  
caused massive lossage, so it's worth thinking about)

> Could it be jumping out of the error handling and actually returning  
> zero content? I can't see where.

It could be crapping itself half way through error handling?

> Is there a module dependency which would cause this?

No, I don't think so, but try running the latest release of FCGI.pm to  
be sure..

> Is it Zeus or fastcgi?

One of these - something I _have_ seen in the past is that some (oh  
hai nginx!) web server's fcgi implementations can get real confused if  
you output a load of stuff to the fcgi error channel..

So try running your fastcgi with --keeperr, see if that has any effect..

Also, try loading Devel::SimpleTrace - with --keeperr  
and ::SimpleTrace then you're more likely to get helpful errors out  
before it all locks up.

Cheers
t0m

P.S. Once you have --keeperr on, then trying this: http://use.perl.org/~jjore/journal/39319 
  could be useful to get a perl level backtrace out..




More information about the Catalyst mailing list