[Catalyst] utf8 in regexes in Catalyst

Jonathan Rockway jon at jrock.us
Mon Mar 3 18:45:26 GMT 2008


* On Sun, Mar 02 2008, Jonathan Rockway wrote:
>
> Correctly decoded data:
>
>   perl -MDevel::Peek -e 'my $data = "ほげ"; utf8::decode($data); Dump($data)'    
>
>   SV = PV(0x72b098) at 0x72e3e0
>     REFCNT = 1
>     FLAGS = (PADMY,POK,pPOK,UTF8)
>     PV = 0x73aa40 "\343\201\273\343\201\222"\0 [UTF8 "\x{307b}\x{3052}"]
>     CUR = 6
>     LEN = 8

I forgot to mention that Devel::StringInfo is much nicer than this.
Devel::Peek will tell you if the "character flag" is on, but it won't do
much else.  Devel::StringInfo will tell you lots of stuff about the
string under test.  Example:
    
    $ perl -MDevel::StringInfo 
      my $string = "ほげ"; 
      Devel::StringInfo->new->dump_info($string)

    string: ã\201»ã\201\222
    is_utf8: 0
    octet_length: 6
    valid_utf8: 1
    decoded_is_same: 0
    decoded:
      octet_length: 6
      downgradable: 0
      char_length: 2
      string: ほげ
      is_utf8: 1
    raw = <<ã\201»ã\201\222>>

Note than my string isn't correctly utf8 (I didn't "use utf8" to decode
the literal), but Devel::StringInfo noticed that the string is valid
utf8, and showed me what would happen if i decoded it correctly.  Very
helpful.  It also knows about encodings other than utf8 or latin-1, in
case those accidentally get into your application.

Generally a good tool.

Regards,
Jonathan Rockway




More information about the Catalyst mailing list