[Catalyst] Handling long-running processes (was Re: Logging is not immediate)

Tue Oct 19 01:54:55 GMT 2010

On 19/10/10 11:05, Bill Moseley wrote:
>
>
> On Mon, Oct 18, 2010 at 4:46 PM, Ken Beal <KBeal at crosscountry-auto.com
> <mailto:KBeal at crosscountry-auto.com>> wrote:
>
>     The issue is that when I call $c->log(), it doesn’t output anything
>     until the “URL call” completes. This makes it difficult to watch a
>     long-running process, because I don’t see anything until it’s done,
>     and I don’t know if it’s hung up on something because I can’t see
>     the log output.
>
>
> maybe try this:  $c->log->_flush;
>
> Or try: warn "I'm stuck in a loop and the web user is wondering why I'm
> taking so long and will likely hit reload any second now!\n";

As a follow-up to Ken's message..

If you have long-running processes in a web server, then I suspect you 
are Doing It Wrong.
(Or, naybe you're not, and have some kind of RPC system under Catalyst, 
in which case ignore what I'm about to say.)

Can I suggest you look at a different model of serving these 
long-running things to the web users? Consider having them hit a page 
which says "Your request is being processed..", which then kicks off the 
actual work in another process, and returns a token of some sort to the 
user's browser. (eg. Cookie or URI parameter)

You then have the browser either reload the page (including the token 
mentioned above), or better yet, use javascript to do it in the 
background. You have two options here - first option is for this request 
to get hung on the server, waiting for the process to complete, or you 
can have it return quickly with the status or progress info to display 
in the meantime.
Once this background request detects the long-running process has 
completed, it then directs the browser to reload, and collects the final 
information.

This has several advantages.
1) If the user keeps hitting reload, you can detect their work token, 
and avoid starting more long-running processes.
2) The user doesn't get stuck with a blank screen and a loading.... 
status bar. Instead they can get progress info, or at least a message 
saying "we're working on it, hang in there..."
3) You don't have your web server tied up with long-running processes, 
holding open sockets and using memory.
4) Logging for your long-running processes is independent of your web 
server messages.

So the URLs the user would see would be something like:
/report/megasize
# server kicks off background process, redirects user onto..
/report/megasize?token=d11a1658f614401782e8
# which says "please wait while we prepare your report".
# The user's browser waits for completion by polling:
/report/progress.json?token=d11a1658f614401782e8
# ..and eventually reloads:
/report/megasize?token=d11a1658f614401782e8
# which now displays the contents of their report

at least, this is the way I think it should be done.

I'm curious to know how other people approach this issue.

Also, what do you think about the polling approach vs a (background) 
connection that stays connected waiting for the completion signal?

Cheers,
Toby