[Catalyst] Patch for Catalyst::Plugin::Unicode::Encoding

Tatsuhiko Miyagawa miyagawa at gmail.com
Wed Mar 19 05:59:22 GMT 2008


Well, if is_utf8($c->req->param($foo)) returns true, it surely means
the string was already decoded. But even if is_utf8 returns false it
*might not mean* that the string was not decoded.

And that's the most annoying part.

  my $foo = "bar"; # all ascii

  utf8::decode($foo);
  utf8::is_utf8($foo); # false  -- Catalyst::Plugin::Unicode does this

  $bar = Encode::decode_utf8($foo);
  utf8::is_utf8($bar); # true -- Catalyst::Plugin::Unicode::Encoding does this

Some modules like XML::LibXML adds UTF-8 flags regardless of if the
characters to handle are composed of latin-1 range (like
Encode::decode_utf8 instead of utf8::decode), and that's pretty much
realistic and sane approach I think.

> perl -MXML::LibXML -e 'warn utf8::is_utf8(XML::LibXML->new->parse_string(<>)->childNodes->shift->textContent)'
<?xml version="1.0" encoding="utf-8"?><response>foo</response>

1 at -e line 1.

I agree with Bill that the plugin trying to decode already utf-8
flagged string doesn't make any sense, but furthermore, I wonder under
which circumstance the plugin tries to decode already-utf8-flagged
strings.

I'd say that's the root problem.


On 3/18/08, Jonathan Rockway <jon at jrock.us> wrote:
>
> A key thing I forgot to mention is that "is_utf8" doesn't mean "we tried
> to decode this already".  It means that the internal representation of
> the string is utf8.
>
> Regards,
> Jonathan Rockway
>
> --
> print just => another => perl => hacker => if $,=$"
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>


-- 
Tatsuhiko Miyagawa



More information about the Catalyst mailing list