[Catalyst] Patch for Catalyst::Plugin::Unicode::Encoding
Tatsuhiko Miyagawa
miyagawa at gmail.com
Wed Mar 19 05:59:22 GMT 2008
Well, if is_utf8($c->req->param($foo)) returns true, it surely means
the string was already decoded. But even if is_utf8 returns false it
*might not mean* that the string was not decoded.
And that's the most annoying part.
my $foo = "bar"; # all ascii
utf8::decode($foo);
utf8::is_utf8($foo); # false -- Catalyst::Plugin::Unicode does this
$bar = Encode::decode_utf8($foo);
utf8::is_utf8($bar); # true -- Catalyst::Plugin::Unicode::Encoding does this
Some modules like XML::LibXML adds UTF-8 flags regardless of if the
characters to handle are composed of latin-1 range (like
Encode::decode_utf8 instead of utf8::decode), and that's pretty much
realistic and sane approach I think.
> perl -MXML::LibXML -e 'warn utf8::is_utf8(XML::LibXML->new->parse_string(<>)->childNodes->shift->textContent)'
<?xml version="1.0" encoding="utf-8"?><response>foo</response>
1 at -e line 1.
I agree with Bill that the plugin trying to decode already utf-8
flagged string doesn't make any sense, but furthermore, I wonder under
which circumstance the plugin tries to decode already-utf8-flagged
strings.
I'd say that's the root problem.
On 3/18/08, Jonathan Rockway <jon at jrock.us> wrote:
>
> A key thing I forgot to mention is that "is_utf8" doesn't mean "we tried
> to decode this already". It means that the internal representation of
> the string is utf8.
>
> Regards,
> Jonathan Rockway
>
> --
> print just => another => perl => hacker => if $,=$"
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
--
Tatsuhiko Miyagawa
More information about the Catalyst
mailing list