[Catalyst] mod_perl converts latin1 to utf8 !?

Mon Dec 29 08:44:09 GMT 2008

On Sun, Dec 28, 2008 at 12:17 PM, Bjørn-Helge Mevik <bhx6 at mevik.net> wrote:

snip snip

> I tried modifying Catalyst::Plugin::Unicode the following way:
> 062016150213:/usr/share/perl5/Catalyst/Plugin# diff Unicode.pm.orig Unicode.pm
> 3a4
>> use Encode qw(encode decode);
> 22c23
> <     utf8::encode( $c->response->{body} );
> ---
>>     encode('ISO-8859-1', $c->response->{body} );
> 38c39
> <         utf8::decode($_) for ( ref($value) ? @{$value} : $value );
> ---
>>         Encode::decode('ISO-8859-1', $_) for ( ref($value) ? @{$value} : $value );
>
> When running under the development server, this seemed to be a no-op:
> everything still worked perfectly.
>
> Under mod_perl, it was almost a no-op as well.  The only difference
> was that when entering non-ASCII letters in a form field and storing
> it, the entered characters were now correctly handled -- however, any
> _existing_ non-ASCII character now became stored in the data base as
> UTF-8.

Here is my wild guess of what happened: in some circumstances the
internal representation of Perl strings can be latin1 - and if you
don't encode it when writing to the database you'll get latin1 in the
database - but for the most common case the internal representation
will be utf8 - and that you'll have in the db when writing to it
without any encoding.  In theory you should not rely on that - because
it is *internal representation*.   You need to encode every output
(and decode every input) that comes from the Perl program to the
outside world - including the database. For each output (input) you do
it separately and you can use different encoding (like UTF-8 for the
web pages and Latin-1 for the DB).  Said that - I don't know much
about the practical side of that - for my work I just always use UTF-8
and  pg_enable_utf8.

-- 
Zbigniew Lukasiak
http://brudnopis.blogspot.com/
http://perlalchemy.blogspot.com/