[Catalyst] utf8 / pg double encoding problem

Andrew Rodland arodland at comcast.net
Sat Jan 5 23:28:49 GMT 2008


On Saturday 05 January 2008 04:54:59 pm Daniel McBrearty wrote:
> well I'm damned, I thought I had this stuff working squeaky clean. But
> I was wrong. I actually had two bugs cancelling each other out -
> usually.
> [snip]
>--' [debug] abçöeü
> [debug] $VAR1 = "ab\x{c3}\x{a7}\x{c3}\x{b6}e\x{c3}\x{bc}";
> [debug] it's UTF8!
>
Looks like the problem is here... the utf8 flag is on, indicating that $edit 
is a string of characters, rather than bytes -- but the dumper output seems 
to show that these "characters" correspond to UTF-8 encoded bytes, instead of 
the actual characters of the data -- meaning that the bytes actually stored 
in the string are along the lines of "ab\x{c3}\x{83}\x{c2}\x{a7}"... not 
good. Somewhere, your data got the utf8 flag set "by assumption" instead of 
by decoding. $edit = decode("UTF-8", $edit) should clear it up, although 
finding the original problem is probably a better idea. :)

Andrew



More information about the Catalyst mailing list