[Catalyst] mod_perl converts latin1 to utf8 !?
Bjørn-Helge Mevik
bhx6 at mevik.net
Sun Dec 28 11:17:40 GMT 2008
Jonathan Rockway wrote:
> When you are working with iso-8859-1, you need to do exactly the same
> thing. Everything you read needs to be decoded, and everything you
> need to write needs to be encoded. (In this case, it is sort of an
> uphill battle since most people use Unicode now, and that is the "code
> path" with the most testing. There are probably places that helpfully
> treat your latin-1 as utf-8, which is definitely incorrect of it.)
I actually do not care which encoding is used for storing or
displaying my data. I chose iso-8859-1 because I'm used to it, and I
thought it would be easier than using UTF-8. (I saw quite a few emails
asking how to handle UTF-8, so I guessed it wouldn't be
straight-forward. Also, my first attempt at following the advice in
<http://dev.catalystframework.org/wiki/gettingstarted/tutorialsandhowtos/using_unicode> was unsuccessful.)
> Decoding is probably a no-op, so focus on the encoding part. Take a
> look at Catalyst::Plugin::Unicode (specifically finalize_body, I
> think), and change the Encode::encode('utf-8', ...) to
> Encode::encode('iso-8859-1', ...)
I tried modifying Catalyst::Plugin::Unicode the following way:
062016150213:/usr/share/perl5/Catalyst/Plugin# diff Unicode.pm.orig Unicode.pm
3a4
> use Encode qw(encode decode);
22c23
< utf8::encode( $c->response->{body} );
---
> encode('ISO-8859-1', $c->response->{body} );
38c39
< utf8::decode($_) for ( ref($value) ? @{$value} : $value );
---
> Encode::decode('ISO-8859-1', $_) for ( ref($value) ? @{$value} : $value );
When running under the development server, this seemed to be a no-op:
everything still worked perfectly.
Under mod_perl, it was almost a no-op as well. The only difference
was that when entering non-ASCII letters in a form field and storing
it, the entered characters were now correctly handled -- however, any
_existing_ non-ASCII character now became stored in the data base as
UTF-8.
In an attempt to "stick to the broad path", I tried using the
_unmodified_ Catalyst::Plugin::Unicode (and removed my modified
process() method of TT). (I still have mysql and all files in latin1,
though.) Now both Apache/mod_perl and the development server work
identically (which is progress, I think :-): All characters are
displayed correctly (as UTF-8), but non-ASCII characters entered into
a form gets stored as UTF-8 in mysql.
So perhaps my best bet now is to try and get my data properly encoded
on the way to mysql?
--
Bjørn-Helge Mevik
More information about the Catalyst
mailing list