[Dbix-class] Wrong UTF-8 handling in DBIx::Class/DBD::mysql despite mysql_enable_utf8

Matias E. Fernandez pisco at gmx.ch
Wed May 12 11:33:30 GMT 2010


Hello

My problem is accurately described in this ticket [1]:

> My issue is that DBD::mysql passes all data as-is to the database even when the connection is in utf8 mode. This way all non ASCII characters of non-utf8-tagged strings gets lost in the database. But passing non-utf8-tagged strings to DBD::mysql should be absolutely valid, since they're valid for Perl they should be valid for DBD::mysql as well.

This is about "The Unicode Bug" [2] and will cause the following test to fail:

my $title = "\x{e4}\x{f6}\x{fc}"; # "äöü"
$album->title($title);
$album->update();
$album->discard_changes();
ok($album->title(), $title, "UTF-8 column survives read/write cycle and preserves character semantics");

Relying on the format with which Perl internally holds strings, is a bad idea. Specially since [3]:

> by default, the internal format is either ISO-8859-1 (latin-1), or utf8, depending on the history of the string.

The rules of thumb for handling data in a programm is outlined in "I/O flow (the actual 5 minute tutorial)" [4]

> 1. Receive and decode
> 2. Process
> 3. Encode and output

The 1st step is handled by DBD::mysql, but not 3rd! Thus, if I want to communicate with a database in UTF-8, I need to encode my data from the format Perl currently holds it in, to UTF-8.

I checked the code base of DBD::mysql [5] and found nowhere where data would be encoded to UTF-8, but I found a test file [6] where a string like this

my $blob = "\x{c4}\x{80}dam"; # same as utf8_str but not utf8 encoded

is being tested for the UTF8 flag after a read/write cycle to the database. I'm not sure wether this is a correct test case, because $blob is not really a blob, but a string that suffers "The Unicode Bug" [2]. But I understand the problem that blobs should not get de/encoded. However, I think that the correct approach according to [4] would be to

Encode::encode('UTF-8', $data);

before sending it to the database, if mysql_enable_utf8 is being used.

However, trying to avoid these issues, one approach is to use DBIx::Class::UTF8Columns, but this seems to be deprecated because is suffers of a bug [7]

> deep in the core of DBIx::Class which affects any component attempting to perform encoding/decoding by overloading store_column and get_columns. As a result of this problem create sends the original column values to the database, while update sends the encoded values. DBIx::Class::UTF8Columns and DBIx::Class::ForceUTF8 are both affected by this bug.

We have come up with a solution that makes use of DBIx::Class::InflateColumn, which is described as follows [8]:

> This component translates column data into references, i.e. "inflating" the column data. It also "deflates" references into an appropriate format for the database.

This seems like the right tool to de/encode data before being sent to the database:

__PACKAGE__->inflate_column('title' => {
   inflate => sub {
       my ($value, $row_obj) = @_;
         # DBD should have already done decoding
         return $value;
   },
   deflate => sub {
       my ($value, $row_obj) = @_;

       # Always Encode, as DBD won't do it
       return Encode::encode('UTF-8', $value);
   },
});

Note that in the above example I assume that mysql_enable_utf8 is being used!

As I have not found the bug description mentioned in [7], I would like to ask whether this solution suffers the same issues as DBIx::Class::UTF8Columns, does.

Regards
Matias E. Fernandez

[1] https://rt.cpan.org/Public/Bug/Display.html?id=25590#txn-300430
[2] http://perldoc.perl.org/5.12.0/perlunicode.html#The-%22Unicode-Bug%22
[3] http://perldoc.perl.org/5.12.0/perlunifaq.html#I-lost-track%3b-what-encoding-is-the-internal-format-really%3f
[4] http://perldoc.perl.org/5.12.0/perlunitut.html#I%2fO-flow-(the-actual-5-minute-tutorial)
[5] http://search.cpan.org/dist/DBD-mysql/
[6] http://cpansearch.perl.org/src/CAPTTOFU/DBD-mysql-4.014/t/55utf8.t
[7] http://search.cpan.org/~frew/DBIx-Class-0.08121/lib/DBIx/Class/UTF8Columns.pm#Warning_-_Module_does_not_function_properly_on_create/insert
[8] http://search.cpan.org/~frew/DBIx-Class-0.08121/lib/DBIx/Class/InflateColumn.pm





More information about the DBIx-Class mailing list