[Catalyst] TT and UNICODE: Garbled special characters

Matt Lawrence matt.lawrence at ymogen.net
Fri Sep 7 16:33:21 GMT 2007


Matt Lawrence wrote:
> Stefan Kühn wrote:
>   
>>    GERMAN UMLAUT HERE: ___\xFC\xFC\xFC___
>>   
>>     
> AFAIK, single-byte-width \xxx escapes are always treated as bytes, not
> as characters. Even if they are outside the 7-bit range, and even in the
> presence of the utf8 pragma.
>
> Try inserting real Unicode characters into the string, explicitly
> upgrading the string using utf8::upgrade or utf8 or use encoding 'latin1'.
>   
Oops, that last paragraph wasn't very clear, and utf8::upgrade was not a
good suggestion. I'll try again:

#Option 1
use utf8; # recognise unicode characters in program text
my $name = "Stefan Kühn"; # use a real UTF-8 character here!

# Option 2
use Encode qw( decode );
my $name = decode("latin-1", "Stefan K\xfchn");

# Option 3
use encoding 'latin1';
my $name = "Stefan K\xfchn";

Once you have a unicode string that's internally marked as such,
C::P::Unicode should do the right thing with it.

Matt




More information about the Catalyst mailing list