[Catalyst] mod_perl converts latin1 to utf8 !?

Bjørn-Helge Mevik bhx6 at mevik.net
Mon Dec 29 13:07:22 GMT 2008


Zbigniew Lukasiak wrote:

> Here is my wild guess of what happened: in some circumstances the
> internal representation of Perl strings can be latin1 - and if you
> don't encode it when writing to the database you'll get latin1 in the
> database - but for the most common case the internal representation
> will be utf8 - and that you'll have in the db when writing to it
> without any encoding.

This is my guess as well.  With the C::P::Unicode, everything that
comes into the app from the browser seems to have UTF-8 as internal
representation, and everything that comes from mysql seems to have
ISO-8859-1 representation.

Thus I've found a hack that seems to work for me:  I use
on_connect_do => [ "set character_set_client = 'utf8'" ] in
connect_info.  This tells mysql to expect UTF-8 from the client.
("set names 'utf8'" would also set the output to UTF-8, so I can't use
that).  It is recommended to also set  mysql_enable_utf8 => 1, but I
still havent seen any effect of that setting (my DBD::mysql is 4.008,
so it should be new enough).

This will probably break when I move the app to the production
server. :-)

> In theory you should not rely on that - because
> it is *internal representation*.   You need to encode every output
> (and decode every input) that comes from the Perl program to the
> outside world - including the database. For each output (input) you do
> it separately and you can use different encoding (like UTF-8 for the
> web pages and Latin-1 for the DB).

I heartily agree.  Unfortunately, sofar I haven't been able to figure
out how to get the proper encode()/decode() when using
Catalyst::Model::DBIC::Schema.

-- 
Bjørn-Helge Mevik



More information about the Catalyst mailing list