[Catalyst] Re: Patch for Catalyst::Plugin::Unicode::Encoding
Matt Lawrence
matt.lawrence at ymogen.net
Wed Mar 19 10:51:19 GMT 2008
Aristotle Pagaltzis wrote:
> * Tatsuhiko Miyagawa <miyagawa at gmail.com> [2008-03-19 07:20]:
>
>> Some modules like XML::LibXML adds UTF-8 flags regardless of if
>> the characters to handle are composed of latin-1 range (like
>> Encode::decode_utf8 instead of utf8::decode), and that's pretty
>> much realistic and sane approach I think.
>>
>
> Yes. If the flag is to have any use at all, then it has to have
> the semantic of distinguishing character vs byte strings.
>
>
>> I agree with Bill that the plugin trying to decode already
>> utf-8 flagged string doesn't make any sense, but furthermore, I
>> wonder under which circumstance the plugin tries to decode
>> already-utf8-flagged strings.
>>
>> I'd say that's the root problem.
>>
>
> Yes; and that’s exactly what Jon said.
>
There are a number of ways that incoming data could already be decoded:
environment, perl switches or pragmata, ideally every application would
do as Jon proposes and ensure that nothing decodes the string before the
plugin sees it. But checking the flag before decoding is at worst
harmless and at best prevents data corruption: it would prevent
already-decoded strings becoming deformed, decode encoded UTF-8 (or
whatever) strings and leave unflagged ASCII strings alone, whether or
not decode had already be attempted.
Perhaps the best approach would be to warn and not decode when flagged
data is seen, that way the data should never be deformed and the author
can see that something else is decoding too early and they can fix it.
Matt
More information about the Catalyst
mailing list