[Catalyst] mod_perl converts latin1 to utf8 !?

Marius Kjeldahl mariusauto-catalyst at kjeldahl.net
Sun Dec 28 14:40:15 GMT 2008


I've followed a similar path to yours, and the best choice eventually
for my projects was to make sure everything is utf8.

Having said this, the specific bug that bit me in the ass was Firefox
3.0 messing up character set encodings for anything using Ajax-style
calls. Fortunately, 3.1 fixed it.

Just mentioning this in case it may be related to your struggles.

Regards,

Marius K.

Bjørn-Helge Mevik wrote:
> Jonathan Rockway wrote:
> 
>> When you are working with iso-8859-1, you need to do exactly the same
>> thing.  Everything you read needs to be decoded, and everything you
>> need to write needs to be encoded.  (In this case, it is sort of an
>> uphill battle since most people use Unicode now, and that is the "code
>> path" with the most testing.  There are probably places that helpfully
>> treat your latin-1 as utf-8, which is definitely incorrect of it.)
> 
> I actually do not care which encoding is used for storing or
> displaying my data.  I chose iso-8859-1 because I'm used to it, and I
> thought it would be easier than using UTF-8.  (I saw quite a few emails
> asking how to handle UTF-8, so I guessed it wouldn't be
> straight-forward.  Also, my first attempt at following the advice in
> <http://dev.catalystframework.org/wiki/gettingstarted/tutorialsandhowtos/using_unicode> was unsuccessful.)
> 
>> Decoding is probably a no-op, so focus on the encoding part.  Take a
>> look at Catalyst::Plugin::Unicode (specifically finalize_body, I
>> think), and change the Encode::encode('utf-8', ...) to
>> Encode::encode('iso-8859-1', ...)
> 
> I tried modifying Catalyst::Plugin::Unicode the following way:
> 062016150213:/usr/share/perl5/Catalyst/Plugin# diff Unicode.pm.orig Unicode.pm
> 3a4
>> use Encode qw(encode decode);
> 22c23
> <     utf8::encode( $c->response->{body} );
> ---
>>     encode('ISO-8859-1', $c->response->{body} );
> 38c39
> <         utf8::decode($_) for ( ref($value) ? @{$value} : $value );
> ---
>>         Encode::decode('ISO-8859-1', $_) for ( ref($value) ? @{$value} : $value );
> 
> When running under the development server, this seemed to be a no-op:
> everything still worked perfectly.
> 
> Under mod_perl, it was almost a no-op as well.  The only difference
> was that when entering non-ASCII letters in a form field and storing
> it, the entered characters were now correctly handled -- however, any
> _existing_ non-ASCII character now became stored in the data base as
> UTF-8.
> 
> 
> In an attempt to "stick to the broad path", I tried using the
> _unmodified_ Catalyst::Plugin::Unicode (and removed my modified
> process() method of TT).  (I still have mysql and all files in latin1,
> though.)  Now both Apache/mod_perl and the development server work
> identically (which is progress, I think :-):  All characters are
> displayed correctly (as UTF-8), but non-ASCII characters entered into
> a form gets stored as UTF-8 in mysql.
> 
> So perhaps my best bet now is to try and get my data properly encoded
> on the way to mysql?
> 





More information about the Catalyst mailing list