[Catalyst] Catalyst::Component/Model Instances and Attributes per Request

Sun Jul 18 18:58:12 GMT 2010

On Sun, Jul 18, 2010 at 2:23 PM, Alejandro Imass
<alejandro.imass at gmail.com> wrote:
> On Sun, Jul 18, 2010 at 12:29 PM, Tomas Doran <bobtfish at bobtfish.net> wrote:
>>
>> On 17 Jul 2010, at 18:58, Alejandro Imass wrote:
>>>
[..]
>> Perl code isn't in the 'code segment' of your process.
>>
>> As it's compiled and run dynamically, after all. :)
>>
>> So only the perl executable and C shared libraries (.sos) will be in code
>> segments. All of your perl code is in data segments.
>>
>
> Yes, this is true, just a manner of trying to explain myself about the
> separation of variables and code. Nevertheless, the thread
> implementation does somthing like this when mounting a module via use.
> I am almost sure but will look for references on this on a mod_perl
> book.
>

Yes. When I wrote of "data segment" I was actually refering to the
mutable data, analogous but definitively not the same thing as you
have correctly pointed out. See O'Reilly's "Practical mod_perl" (I
have an old 2003 copy) section 24.3.1 Thread Support:

<quote>
In order to adapt to the Apache 2.0 threads architecture (for threaded
MPMs), mod_perl 2.0 needs to use thread-safe Perl interpreters, also
known as ithreads (interpreter threads). This mechanism is enabled at
compile time and ensures that each Perl interpreter instance is
reentrant—that is, multiple Perl interpreters can be used concurrently
within the same process without locking, as each instance has its own
copy of any mutable data (symbol tables, stacks, etc.). This of course
requires that each Perl interpreter instance is accessed by only one
thread at any given time.
</quote>

And this is what I meant by sharing the "code segment", and again your
clarification is precise:

<quote>
The first mod_perl generation has only a single PerlInterpreter, which
is constructed by the parent process, then inherited across the forks
to child processes. mod_perl 2.0 has a configurable number of
PerlInterpreters and two classes of interpreters, parent and clone. A
parent is like in mod_perl 1.0, where the main interpreter created at
startup time compiles any preloaded Perl code. A clone is created from
the parent using the Perl API perl_clone( ) function. At request time,
parent interpreters are used only for making more clones, as the
clones are the interpreters that actually handle requests. Care is
taken by Perl to copy only mutable data, which means that no runtime
locking is required and read-only data such as the syntax tree is
shared from the parent, which should reduce the overall mod_perl
memory footprint.
</quote>

I guess when I read it the first time I associated it immediately with
SO executables and shared libs and that's the picture that remained in
my mind, fortunately I remembered the source!

>From this bit, and although memory allocations for variables remain
across multiple requests, it seems clear from the following text, that
everything is on a per_request basis:

<quote>
Rather than creating a PerlInterperter for each thread, by default
mod_perl creates a pool of interpreters. The pool mechanism helps cut
down memory usage a great deal. As already mentioned, the syntax tree
is shared between all cloned interpreters. If your server is serving
more than just mod_perl requests, having a smaller number of
PerlInterpreters than the number of threads will clearly cut down on
memory usage. Finally, perhaps the biggest win is memory reuse: as
calls are made into Perl subroutines, memory allocations are made for
variables when they are used for the first time. Subsequent use of
variables may allocate more memory; e.g., if a scalar variable needs
to hold a longer string than it did before, or an array has new
elements added. As an optimization, Perl hangs onto these allocations,
even though their values go out of scope. mod_perl 2.0 has much better
control over which PerlInterpreters are used for incoming requests.
The interpreters are stored in two linked lists, one for available
interpreters and another for busy ones. When needed to handle a
request, one interpreter is taken from the head of the available list,
and it's put back at the head of the same list when it's done. This
means that if, for example, you have ten interpreters configured to be
cloned at startup time, but no more than five are ever used
concurrently, those five continue to reuse Perl's allocations, while
the other five remain much smaller, but ready to go if the need
arises.
</quote>

>>> I'm pretty sure there is no sequential guarantee (a queue/stack) when
>>> two controller threads use the same model instance, nor is the thread
>>> implementation aware that it should create a separate data segment for
>>> the second and beyond calls to ACCEPT_CONTEXT.
>>

And finally, this part seems to address some of my concerns in the OP:

<quote>
It's important to notice that the Perl ithreads implementation ensures
that Perl code is thread-safe, at least with respect to the Apache
threads in which it is running. However, it does not ensure that
functions and extensions that call into third-party C/C++ libraries
are thread-safe. In the case of non-thread-safe extensions, if it is
not possible to fix those routines, care needs to be taken to
serialize calls into such functions (either at the XS or Perl level).
See Perl 5.8.0's perlthrtut manpage.

Note that while Perl data is thread-private unless explicitly shared
and threads themselves are separate execution threads, the threads can
affect process-scope state, affecting all the threads. For example, if
one thread does chdir("/tmp"), the current working directory of all
threads is now /tmp. While each thread can correct its current working
directory by storing the original value, there are functions whose
process-scope changes cannot be undone. For example, chroot( ) changes
the root directory of all threads, and this change is not reversible.
Refer to the perlthrtut manpage for more information.
</quote>

>> No, hang on.. This works using perl's threads, so be default you have
>> 'nothing shared'..
>>
>> I may be entirely wrong here (someone correct me?)

So in conclusion, it seems reasonable to say that I should not worry
about the global vars (the Moose object attributes) in my Model
Instance to get overridden by the ACCEPT_CONTEXT call, as this call
will only be called once in sequence with a single request. Anyone
disagree?

Sorry for making the thread so long, but since not too many people use
Catalyst in multi-threaded environments many things seem to be taken
for granted, and I just want to make sure my code doesn't do anything
stupid, I mean beyond my own code's potential stupidity ;-)

Thanks,
Alejandro