[Catalyst] Re: Avoiding UTF8 in Catalyst

Bill Moseley moseley at hank.org
Tue Dec 8 15:57:45 GMT 2009


On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis <pagaltzis at gmx.de>wrot=
e:

>
> There is no such thing as an octet stream in Perl. There are only
> strings, and strings are sequences of arbitrarily large integers.
>

Help me out here.

What I've stuck in my mind is that the poorly-named utf8 flag on Perl
strings is really the "is_character_data" flag.   To get get character data
it *must* be decoded on input, and the act of decoding sets that flag.  Even
decoding 8 bit character encoding will set the flag.

$ perl -MEncode -wle '$x=3DEncode::decode("ASCII", "hello");  print
Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";'
flag set

$ perl -MEncode -wle '$x=3DEncode::decode("iso-8859-1", "hello");  print
Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";'
flag set

And any strings with the flag set *must* be encoded before printing (sending
out of Perl) -- otherwise you are printing abstract "characters" that have
no meaning outside of Perl.

Plus, content_length must be the encoded length.  Therefore, it's impossible
to set the content length on character data unless you encode it first.

So the code seems like it must be:

die "no clue how long the body is because it's still characters" if
Encode::is_utf8( $response->body );
$response->content_length( length( $response->body ) );

That's not very friendly, of course.  But, what other choice is there?

The correct thing would be to force all responses to have a defined content
type and then encode the characters at the end of the request (right before
setting content length).




-- =

Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20091208/01440=
a77/attachment.htm


More information about the Catalyst mailing list