[Catalyst] Avoiding UTF8 in Catalyst

Marc SCHAEFER schaefer at alphanet.ch
Sat Nov 21 22:22:47 GMT 2009


my goal: no UTF8, in short:

   - all the perl code, all the data files, all the template files and
     the UNIX locale are all in ISO-8859-1

   - the HTML result should be in ISO-8859-1
     (Content-Type: text/html; charset=iso-8859-1)

   - the Content-Length: should be correct.

First, I modified lib/MyApp/View/TT.pm as follows:

   __PACKAGE__->config(TEMPLATE_EXTENSION => '.tt',
                       DEFAULT_ENCODING   => 'ISO-8859-1',
                       WRAPPER => 'wrapper.tt');

Apparently all diacritic characters are expanded into HTML entities.
Which is functional, but not optimal.  However, with FormFu, this
unnecessary expansion doesn't happen, which is fine. 

I got the following result:

   - the HTML data is in ISO-8859-1 (or as HTML entities, which is
     acceptable as a work-around) as wanted
   - however the HTTP header charset is UTF8

After looking at line 45 of
it looks that the utf-8 charset HTTP header is hardcoded. I have thus modified
my lib/MyApp/Controller/Root.pm to do the following in
end : ActionClass('RenderView'):

   $c->response->content_type('text/html; charset=iso-8859-1');

With this, I got the following result:

   - the HTML data is in ISO-8859-1 as wanted (no change, logical)
   - the HTTP header charset is now the correct iso-8859-1
   - however, the Content-Length: sent is wrong.

After investigating, the Content-Length: is one off per non 7-bit
character. As if the standard iso-8859-1 byte stream was sent as
is, but was, internally converted to UTF-8 just for generating
a wrong byte count. Very strange.  Normally that process should really
output something wrong or generate an error in the conversion. It

My questions:

   - is there a better way to use the standard charset than to do all
     of the above hacks ?

   - if not, how to work-around the content length in
      end : ActionClass('RenderView') ?  Unfortunately, it looks like
     $c->result->body is undefined at this point, and that
     $c->finalize_body() doesn't do anything useful.

Version info:
 Catalyst 5.80007 and 5.80013

PS: I wouldn't have noticed the Content-Length: issue if I hadn't use a
    reverse proxy.  With that reverse proxy, and the standalone Catalyst
    server, you get 5-10 seconds hangs if the Content-Length is too big,
    which is what happens with this strange UTF8 behaviour. Without it,
    the size is wrong (as seen by wireshark != PageInfo Firefox), but
    the WWW client seems to compensate.

PS/2: the http://www.catb.org/~esr/faqs/smart-questions.html URL doesn't
      work currently, so maybe my question is unsmart.

More information about the Catalyst mailing list