[Catalyst] Avoiding UTF8 in Catalyst
schaefer at alphanet.ch
Sat Nov 21 22:22:47 GMT 2009
my goal: no UTF8, in short:
- all the perl code, all the data files, all the template files and
the UNIX locale are all in ISO-8859-1
- the HTML result should be in ISO-8859-1
(Content-Type: text/html; charset=iso-8859-1)
- the Content-Length: should be correct.
First, I modified lib/MyApp/View/TT.pm as follows:
__PACKAGE__->config(TEMPLATE_EXTENSION => '.tt',
DEFAULT_ENCODING => 'ISO-8859-1',
WRAPPER => 'wrapper.tt');
Apparently all diacritic characters are expanded into HTML entities.
Which is functional, but not optimal. However, with FormFu, this
unnecessary expansion doesn't happen, which is fine.
I got the following result:
- the HTML data is in ISO-8859-1 (or as HTML entities, which is
acceptable as a work-around) as wanted
- however the HTTP header charset is UTF8
After looking at line 45 of
it looks that the utf-8 charset HTTP header is hardcoded. I have thus modified
my lib/MyApp/Controller/Root.pm to do the following in
end : ActionClass('RenderView'):
With this, I got the following result:
- the HTML data is in ISO-8859-1 as wanted (no change, logical)
- the HTTP header charset is now the correct iso-8859-1
- however, the Content-Length: sent is wrong.
After investigating, the Content-Length: is one off per non 7-bit
character. As if the standard iso-8859-1 byte stream was sent as
is, but was, internally converted to UTF-8 just for generating
a wrong byte count. Very strange. Normally that process should really
output something wrong or generate an error in the conversion. It
- is there a better way to use the standard charset than to do all
of the above hacks ?
- if not, how to work-around the content length in
end : ActionClass('RenderView') ? Unfortunately, it looks like
$c->result->body is undefined at this point, and that
$c->finalize_body() doesn't do anything useful.
Catalyst 5.80007 and 5.80013
PS: I wouldn't have noticed the Content-Length: issue if I hadn't use a
reverse proxy. With that reverse proxy, and the standalone Catalyst
server, you get 5-10 seconds hangs if the Content-Length is too big,
which is what happens with this strange UTF8 behaviour. Without it,
the size is wrong (as seen by wireshark != PageInfo Firefox), but
the WWW client seems to compensate.
PS/2: the http://www.catb.org/~esr/faqs/smart-questions.html URL doesn't
work currently, so maybe my question is unsmart.
More information about the Catalyst