[Catalyst] Re: Avoiding UTF8 in Catalyst

Bill Moseley moseley at hank.org
Wed Dec 9 04:32:35 GMT 2009


On Tue, Dec 8, 2009 at 7:05 PM, Jonathan Rockway <jon at jrock.us> wrote:

> * On Tue, Dec 08 2009, Bill Moseley wrote:
> > On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis <pagaltzis at gmx.de>
> wrote:
> >
> >     There is no such thing as an octet stream in Perl. There are only
> >     strings, and strings are sequences of arbitrarily large integers.
> >
> > Help me out here.
> >
> > What I've stuck in my mind is that the poorly-named utf8 flag on Perl
> strings is really
> > the "is_character_data" flag.   To get get character data it *must* be
> decoded on
> > input, and the act of decoding sets that flag.  Even decoding 8 bit
> character encoding
> > will set the flag.
>
> Sorry, it doesn't mean that.  latin1 text is character data, but won't
> have the UTF8 flag on.



 $ perl -MEncode -wle '$x=3DEncode::decode("Latin1", "hello");  print
Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";'
flag set


 The UTF8 flag doesn't mean anything more than
> any of the other SV flags.


But the flag on indicates the the string was decoded.  And that implies that
it needs to be encoded.  And if I don't know what encoding to use then it's
time to throw an exception.

That's why it seems like the Engine should throw an exception if the utf8
flag is set when it's time to get the length.  Because the encoding is not
known so it's impossible to know the encoded byte length.




-- =

Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20091208/acd0f=
cf6/attachment.htm


More information about the Catalyst mailing list