[Catalyst] mod_perl converts latin1 to utf8 !?
Zbigniew Lukasiak
zzbbyy at gmail.com
Mon Dec 29 08:44:09 GMT 2008
On Sun, Dec 28, 2008 at 12:17 PM, Bjørn-Helge Mevik <bhx6 at mevik.net> wrote:
snip snip
> I tried modifying Catalyst::Plugin::Unicode the following way:
> 062016150213:/usr/share/perl5/Catalyst/Plugin# diff Unicode.pm.orig Unicode.pm
> 3a4
>> use Encode qw(encode decode);
> 22c23
> < utf8::encode( $c->response->{body} );
> ---
>> encode('ISO-8859-1', $c->response->{body} );
> 38c39
> < utf8::decode($_) for ( ref($value) ? @{$value} : $value );
> ---
>> Encode::decode('ISO-8859-1', $_) for ( ref($value) ? @{$value} : $value );
>
> When running under the development server, this seemed to be a no-op:
> everything still worked perfectly.
>
> Under mod_perl, it was almost a no-op as well. The only difference
> was that when entering non-ASCII letters in a form field and storing
> it, the entered characters were now correctly handled -- however, any
> _existing_ non-ASCII character now became stored in the data base as
> UTF-8.
Here is my wild guess of what happened: in some circumstances the
internal representation of Perl strings can be latin1 - and if you
don't encode it when writing to the database you'll get latin1 in the
database - but for the most common case the internal representation
will be utf8 - and that you'll have in the db when writing to it
without any encoding. In theory you should not rely on that - because
it is *internal representation*. You need to encode every output
(and decode every input) that comes from the Perl program to the
outside world - including the database. For each output (input) you do
it separately and you can use different encoding (like UTF-8 for the
web pages and Latin-1 for the DB). Said that - I don't know much
about the practical side of that - for my work I just always use UTF-8
and pg_enable_utf8.
--
Zbigniew Lukasiak
http://brudnopis.blogspot.com/
http://perlalchemy.blogspot.com/
More information about the Catalyst
mailing list