[Catalyst] Re: Avoiding UTF8 in Catalyst
Bill Moseley
moseley at hank.org
Tue Dec 8 15:57:45 GMT 2009
On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis <pagaltzis at gmx.de>wrot=
e:
>
> There is no such thing as an octet stream in Perl. There are only
> strings, and strings are sequences of arbitrarily large integers.
>
Help me out here.
What I've stuck in my mind is that the poorly-named utf8 flag on Perl
strings is really the "is_character_data" flag. To get get character data
it *must* be decoded on input, and the act of decoding sets that flag. Even
decoding 8 bit character encoding will set the flag.
$ perl -MEncode -wle '$x=3DEncode::decode("ASCII", "hello"); print
Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";'
flag set
$ perl -MEncode -wle '$x=3DEncode::decode("iso-8859-1", "hello"); print
Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";'
flag set
And any strings with the flag set *must* be encoded before printing (sending
out of Perl) -- otherwise you are printing abstract "characters" that have
no meaning outside of Perl.
Plus, content_length must be the encoded length. Therefore, it's impossible
to set the content length on character data unless you encode it first.
So the code seems like it must be:
die "no clue how long the body is because it's still characters" if
Encode::is_utf8( $response->body );
$response->content_length( length( $response->body ) );
That's not very friendly, of course. But, what other choice is there?
The correct thing would be to force all responses to have a defined content
type and then encode the characters at the end of the request (right before
setting content length).
-- =
Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20091208/01440=
a77/attachment.htm
More information about the Catalyst
mailing list