[Dbix-class] Unicode woes

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Wed Jan 8 14:38:11 GMT 2014


I'm having some grief with Unicode. There's a MySQL database with a
table that contains a text column with Unicode values. When I read those
values with DBIx::Class they emerge as Windows-1252 and I'm going mad in
the twisty little passages trying to find whatever it is I need to
change to get at the original Unicode values.

The DDL for the table is like this:

DROP TABLE IF EXISTS `text_for_pages`;
CREATE TABLE `text_for_pages` (
  `page` text NOT NULL,
  `keytext` text NOT NULL,
  `value` text,
  PRIMARY KEY (`page`(30),`keytext`(30))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

I believe the values of the 'value' column really are Unicode because if
I execute a command like:

  mysql -e 'use mydb; select * from text_for_pages \
  where page = "page-name"' > dump.txt

and then examine dump.txt with 'od', I can see the two-byte Unicode
sequences I expect. (non-breaking spaces, left and right quotes).

But when I run my DBIC program and dump the value, I see single-byte
characters such as \xA0, \x91, \x92 which I believe are Windows-1252.

At connection time, I presently set a couple of options:

  {
    mysql_enable_utf8 => 1,
    on_connect_do     => "SET NAMES 'utf8'",
  };

I'm not sure what I should be doing differently?

Cheers, Dave



More information about the DBIx-Class mailing list