[Catalyst] Re: Patch for Catalyst::Plugin::Unicode::Encoding

Wed Mar 19 10:51:19 GMT 2008

Aristotle Pagaltzis wrote:
> * Tatsuhiko Miyagawa <miyagawa at gmail.com> [2008-03-19 07:20]:
>   
>> Some modules like XML::LibXML adds UTF-8 flags regardless of if
>> the characters to handle are composed of latin-1 range (like
>> Encode::decode_utf8 instead of utf8::decode), and that's pretty
>> much realistic and sane approach I think.
>>     
>
> Yes. If the flag is to have any use at all, then it has to have
> the semantic of distinguishing character vs byte strings.
>
>   
>> I agree with Bill that the plugin trying to decode already
>> utf-8 flagged string doesn't make any sense, but furthermore, I
>> wonder under which circumstance the plugin tries to decode
>> already-utf8-flagged strings.
>>
>> I'd say that's the root problem.
>>     
>
> Yes; and that’s exactly what Jon said.
>   
There are a number of ways that incoming data could already be decoded: 
environment, perl switches or pragmata, ideally every application would 
do as Jon proposes and ensure that nothing decodes the string before the 
plugin sees it. But checking the flag before decoding is at worst 
harmless and at best prevents data corruption: it would prevent 
already-decoded strings becoming deformed, decode encoded UTF-8 (or 
whatever) strings and leave unflagged ASCII strings alone, whether or 
not decode had already be attempted.

Perhaps the best approach would be to warn and not decode when flagged 
data is seen, that way the data should never be deformed and the author 
can see that something else is decoding too early and they can fix it.

Matt