[Dbix-class] Wrong UTF-8 handling in DBIx::Class/DBD::mysql despite mysql_enable_utf8

Marc Mims marc at questright.com
Wed May 12 22:19:30 GMT 2010


* Matias E. Fernandez <pisco at gmx.ch> [100512 14:29]:
> On 2010-05-12, at 17:16, Marc Mims wrote:
> >> It is a string consisting of the three characters \x{e4}, \x{f6} and \x{fc}. 
> >> That's about all I have to know as a Perl user, reread [1] if in doubt. The 
> >> important thing to know is that you cannot rely on Perl internally holding 
> >> strings in UTF-8! Of course I could force Perl to internally hold this string in 
> >> UTF-8 by using utf8::upgrade(), but the question is: where should I do that so 
> >> as to cover all cases? As pointed out in [2], overwriting get_columns and 
> >> store_columns won't work reliably. That's why I suggested using the 
> >> inflate/deflate subroutines, but will this work in all cases? Even then it would 
> >> be a bad idea to use utf8::upgrade() because that's not was it's meant for. As 
> >> pointed out in [3] the flow should be as follows:
> > 
> > No.
> 
> What do you mean by "no"? Which part of the passage do you disagree with?
> 
> > It's a string consisting of 3 bytes that happen to be latin-1 characters.

I thought the statement above clarified the "no".

> I disagree with that. Consider this:
> 
> my $string = "\x{e4}\x{f6}\x{fc}";
> utf8::upgrade($string);
> 
> my $other_string = "\x{e4}\x{f6}\x{fc}";
> 
> ok($string eq $other_string, "upgraded and not upgraded character strings are equal");
> 
> Both $string and $other_string a perfectly valid Perl character strings, and 
> they are equal. How Perl holds them internally doesn't and shouldn't matter.

Unfortunately, it does matter.  Perl supports 2 types of strings: byte
strings and unicode strings.  For legacy reasons, byte-strings are
interpreted as latin-1. In your example, $string (after the
utf8::upgrade) is a unicode string. $other_string is not.  DBD::mysql
with mysql_enable_utf8 will be happy with $string but apparently isn't
happy with $other_string.

I was just trying to be helpful.  Like I said, I'm no unicode expert.  I
struggled with the same problems you seem to be having, resolved them,
and think I have a working understanding of unicode in perl, now.

If my comments haven't helped, I'll defer to someone with more expertise
on the subject who can answer your questions more articulately.

Good luck.

	-Marc



More information about the DBIx-Class mailing list