[Catalyst] utf8 / pg double encoding problem

Daniel McBrearty danielmcbrearty at gmail.com
Sat Jan 5 22:54:59 GMT 2008


well I'm damned, I thought I had this stuff working squeaky clean. But
I was wrong. I actually had two bugs cancelling each other out -
usually.

Anyway, cut to the chase. The problem (the second bug that didn't show
until I cleared the first one) is a form submit that stores in the
database but ends up being double encoded when you get it back.

I am *certain* that UTF8 is getting sent over the wire to catalyst.

Here is some test data:

the data being submitted is this, with the two ways of representing it

abçöeü
unicode = 61 62 e7 f6 65 fc
utf8 = 61 62 (c3 a7) (c3 b6) 65 (c3 bc)

I have C::Plugin::Unicode. The database is a postgres 8.2 utf8 database.

Here is a snip of the controller code:

<snip>
sub sitetext_update : Chained('translator') Args(0){
  my ( $self, $c ) = @_;

  my $edit = $c->req->param('edit') || '';
  $c->log->debug( $edit );
  $c->log->dumper( $edit );
  $c->log->debug( utf8::is_utf8( $edit ) ? "it's UTF8!" : "no it isn't UTF8" );

... ($edit gets written to the db here)

}
</snip>

Here's what we see, debug output:

<snip>
.-------------------------------------+--------------------------------------.
| Parameter                           | Value                                |
+-------------------------------------+--------------------------------------+
| edit                                | abçöeü                      |
'-------------------------------------+--------------------------------------'
[debug] abçöeü
[debug] $VAR1 = "ab\x{c3}\x{a7}\x{c3}\x{b6}e\x{c3}\x{bc}";
[debug] it's UTF8!
</snip>

here is what happens in the pg logfile:

2008-01-05 23:21:45 CET LOG:  execute dbdpg_11: UPDATE
sitetext_translated SET content = $1, timestamp = $2 WHERE (
language_id = $3 AND sitetext_id = $4 )
2008-01-05 23:21:45 CET DETAIL:  parameters: $1 = 'abçöeü', $2 =
'2008-01-05 22:21:45', $3 = '22', $4 = 'ca'

and yes, it is double encoded when we retrieve the data.

Why? We have  good utf8 string coming in. It is flagged as such (odd
that the usual debug output doesn't display this right though ...). PG
expects UTF8 from the client. So why?

I use utf8columns on the database. I understand that this just tells
perl "this is utf8, you can flag it as such" when it gets data from
that column? so should not be an issue, right?

thanks if you can explain this. I really hope that an hour from now I
will be saying "DOH" and slapping ice cream into my forehead.


More information about the Catalyst mailing list