[Catalyst] mod_perl converts latin1 to utf8 !?

Bjørn-Helge Mevik bhx6 at mevik.net
Sun Dec 28 11:17:40 GMT 2008


Jonathan Rockway wrote:

> When you are working with iso-8859-1, you need to do exactly the same
> thing.  Everything you read needs to be decoded, and everything you
> need to write needs to be encoded.  (In this case, it is sort of an
> uphill battle since most people use Unicode now, and that is the "code
> path" with the most testing.  There are probably places that helpfully
> treat your latin-1 as utf-8, which is definitely incorrect of it.)

I actually do not care which encoding is used for storing or
displaying my data.  I chose iso-8859-1 because I'm used to it, and I
thought it would be easier than using UTF-8.  (I saw quite a few emails
asking how to handle UTF-8, so I guessed it wouldn't be
straight-forward.  Also, my first attempt at following the advice in
<http://dev.catalystframework.org/wiki/gettingstarted/tutorialsandhowtos/using_unicode> was unsuccessful.)

> Decoding is probably a no-op, so focus on the encoding part.  Take a
> look at Catalyst::Plugin::Unicode (specifically finalize_body, I
> think), and change the Encode::encode('utf-8', ...) to
> Encode::encode('iso-8859-1', ...)

I tried modifying Catalyst::Plugin::Unicode the following way:
062016150213:/usr/share/perl5/Catalyst/Plugin# diff Unicode.pm.orig Unicode.pm
3a4
> use Encode qw(encode decode);
22c23
<     utf8::encode( $c->response->{body} );
---
>     encode('ISO-8859-1', $c->response->{body} );
38c39
<         utf8::decode($_) for ( ref($value) ? @{$value} : $value );
---
>         Encode::decode('ISO-8859-1', $_) for ( ref($value) ? @{$value} : $value );

When running under the development server, this seemed to be a no-op:
everything still worked perfectly.

Under mod_perl, it was almost a no-op as well.  The only difference
was that when entering non-ASCII letters in a form field and storing
it, the entered characters were now correctly handled -- however, any
_existing_ non-ASCII character now became stored in the data base as
UTF-8.


In an attempt to "stick to the broad path", I tried using the
_unmodified_ Catalyst::Plugin::Unicode (and removed my modified
process() method of TT).  (I still have mysql and all files in latin1,
though.)  Now both Apache/mod_perl and the development server work
identically (which is progress, I think :-):  All characters are
displayed correctly (as UTF-8), but non-ASCII characters entered into
a form gets stored as UTF-8 in mysql.

So perhaps my best bet now is to try and get my data properly encoded
on the way to mysql?

-- 
Bjørn-Helge Mevik



More information about the Catalyst mailing list