[Catalyst] utf8 in regexes in Catalyst
Jonathan Rockway
jon at jrock.us
Mon Mar 3 18:45:26 GMT 2008
* On Sun, Mar 02 2008, Jonathan Rockway wrote:
>
> Correctly decoded data:
>
> perl -MDevel::Peek -e 'my $data = "ほげ"; utf8::decode($data); Dump($data)'
>
> SV = PV(0x72b098) at 0x72e3e0
> REFCNT = 1
> FLAGS = (PADMY,POK,pPOK,UTF8)
> PV = 0x73aa40 "\343\201\273\343\201\222"\0 [UTF8 "\x{307b}\x{3052}"]
> CUR = 6
> LEN = 8
I forgot to mention that Devel::StringInfo is much nicer than this.
Devel::Peek will tell you if the "character flag" is on, but it won't do
much else. Devel::StringInfo will tell you lots of stuff about the
string under test. Example:
$ perl -MDevel::StringInfo
my $string = "ほげ";
Devel::StringInfo->new->dump_info($string)
string: ã\201»ã\201\222
is_utf8: 0
octet_length: 6
valid_utf8: 1
decoded_is_same: 0
decoded:
octet_length: 6
downgradable: 0
char_length: 2
string: ほげ
is_utf8: 1
raw = <<ã\201»ã\201\222>>
Note than my string isn't correctly utf8 (I didn't "use utf8" to decode
the literal), but Devel::StringInfo noticed that the string is valid
utf8, and showed me what would happen if i decoded it correctly. Very
helpful. It also knows about encodings other than utf8 or latin-1, in
case those accidentally get into your application.
Generally a good tool.
Regards,
Jonathan Rockway
More information about the Catalyst
mailing list