[Dbix-class] Unicode conversion problems

Matias E. Fernandez pisco at gmx.ch
Mon Jul 5 21:02:02 GMT 2010


Hello Jesse

I'm pretty sure your data has been UTF-8 encoded twice. Consider this example:

use strict;
use warnings;

use Encode;

# $string is UTF-8, but Perl doesn't know
my $string = 'Pérez-Reverte, Arturo Кири́ллица ქართული  汉字 / 漢';
# $double_utf8 contains the double UTF-8 encoded string
# note that this is an implicit ISO-8859-1 to UTF-8 conversion
my $double_utf8 = Encode::encode('UTF-8', $string);

print "double encoded UTF-8:\n", "$double_utf8\n\n";

# let Perl believe that $double_utf8 is UTF-8
Encode::_utf8_on($double_utf8);
# run $double_utf8 through a UTF-8 to ISO-8859-1 conversion
my $double_utf8_to_latin1 = Encode::decode('ISO-8859-1', $double_utf8);

print "double UTF-8 to ISO-8859-1:\n", "$double_utf8_to_latin1\n\n";

So why is your data in the database double encoded UTF-8? The problem is that you're not using the mysql_enable_utf8 option (see the DBD::mysql documentation). If you don't use that option as a part to the call to 'connect()', DBD::mysql will the configure the connection in a way that MySQL believes it's being sent ISO-8859-1. Because you're table is configured to store character data as UTF-8, MySQL converts the received data from ISO-8859-1 to UTF-8. There you have double encoded UTF-8! 

The solution is simply to use mysql_enable_utf8 as part of the call to 'connect()'. If you're using DBIx::Class I recommend also disabling the mysql_auto_reconnect option, this will save you a lot of headache.

Regards
Matias E. Fernandez





More information about the DBIx-Class mailing list