[Catalyst] Re: Avoiding UTF8 in Catalyst

Mon Nov 23 21:08:32 GMT 2009

* Bill Moseley <moseley at hank.org> [2009-11-23 20:10]:
> I'd argue that when it's time to set the length it should die
> if utf8 flag is still set.

I’m of two minds about this… it may well be that a string is
correctly encoded but has gotten upgraded, and such a string will
produce the right output anyhow. I don’t know if it’s not too
stringent to demand that the UTF8 flag be off.

However, the string should be *downgradeable* by that time. If
there are wide characters in it at that time, then throwing an
exception is absolutely the right thing to do. But if there
aren’t, then you can’t decide based on the UTF8 flag whether the
string is correct or not.

As I wrote, you can read a binary file, upgrade the string, and
output it right back, and you’ll get an identical copy of the
file out of that, because a string means one and the same thing
regardless of whether it’s upgraded.

> When calculating the length the content should have already
> been encoded.

Yes.

> Again, at some point decoding and encoding  should be core not
> just a plugin.  It's an important part of the request cycle.

I agree.

Although it’s difficult to make it fully automatic because
browsers suck so bad about telling you what encoding the data
that they’re sending is in.

I am working on a plugin for that, but due to its dependencies
and API I don’t know if it’d be reasonable to make it core.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>