[Catalyst] Re: Avoiding UTF8 in Catalyst

Jonathan Rockway jon at jrock.us
Tue Dec 8 05:34:23 GMT 2009


Sorry to dig up a very old thread, but I am very behind on email and
wanted to comment :)

* On Sun, Nov 22 2009, Aristotle Pagaltzis wrote:
> So I went thrawling the Catalyst sources and found what appears
> to be the offending line. From finalize_headers in Catalyst.pm:
>
>     # everything should be bytes at this point, but just in case
>     $response->content_length( bytes::length( $response->body ) );
>
> I was shocked to discover this! Any code that uses bytes::length
> is automatically broken.

FWIW, we did this so that people not using Catalyst::Plugin::Unicode but
that had a Unicode string in memory would get something resembling the
correct result.  The next line of code basically copies the char*
backing the SV into the response socket.  Also wrong, but works for the
correct case and for many incorrect-but-still-common cases.  (I know lot
of prominent Catalyst developers that had their apps horribly wrong for
years, but still used their website to make millions of dollars.  It is
nice to get everything right all the time, but sometimes you don't...)

Basically, if you are doing things right, this code will cause no harm
(as the string will be an octet stream, and bytes::length will return
the length of the octet stream you are about to send).  If you are doing
things wrong, you might get the right answer (because you will get the
length of your octet stream that you are about to send, and those octets
happen to represent utf-8 or latin-1, and that's what your content-type
header said you would send).  A "you fail" error would be nice... but
could be annoying in a number of cases.  HTTP is a binary protocol, but
people need to send text, so there is an impedance mismatch.

Catalyst's Unicode handling has been a nightmare because of the
weird-ass things people do with "Unicode", general misunderstanding, and
backwards compatibility.  (I recall someone wanting the URLs in their
app to be EUC_JP-encoded, but the form submissions to be UTF-8.)

When it's possible to break Catalyst backcompat severely, a correct
solution will be implemented.  But for now, trying hard to Do The Right
Thing (instead of causing weird web browser errors) is what we're stuck
with.

Regards,
Jonathan Rockway

--
print just => another => perl => hacker => if $,=$"



More information about the Catalyst mailing list