[Catalyst] Re: Avoiding UTF8 in Catalyst

Jonathan Rockway jon at jrock.us
Wed Dec 9 03:05:32 GMT 2009


* On Tue, Dec 08 2009, Bill Moseley wrote:
> On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis <pagaltzis at gmx.de> wrote:
>
>     There is no such thing as an octet stream in Perl. There are only
>     strings, and strings are sequences of arbitrarily large integers.
>
> Help me out here.
>
> What I've stuck in my mind is that the poorly-named utf8 flag on Perl strings is really
> the "is_character_data" flag.   To get get character data it *must* be decoded on
> input, and the act of decoding sets that flag.  Even decoding 8 bit character encoding
> will set the flag.

Sorry, it doesn't mean that.  latin1 text is character data, but won't
have the UTF8 flag on.  The UTF8 flag doesn't mean anything more than
any of the other SV flags. All of these flags are basically performance
hacks and should be considered totally off-limits to user code.  They
have absolutely no meaning there.

> And any strings with the flag set *must* be encoded before printing (sending out of
> Perl) -- otherwise you are printing abstract "characters" that have no meaning outside
> of Perl.

Any string without the flag set must also be encoded.

If text ever enters your application, it must do so through a call to
decode.  If text ever leaves your application, it must do so through a
call to encode.

Your application must always, without exception, decode and encode all
text data.

It's confusing because this is sometimes done automatically by libraries
that are in use.  It's confusing because sometimes it's *not* done by
the libraries that are in use :) If you're not sure if your library is
doing this for you, read the source, or ask someone :)

Regards,
Jonathan Rockway

--
print just => another => perl => hacker => if $,=$"



More information about the Catalyst mailing list