[Catalyst] Re: Avoiding UTF8 in Catalyst

Mon Nov 23 16:43:25 GMT 2009

* Marc SCHAEFER <schaefer at alphanet.ch> [2009-11-23 17:20]:
> On Mon, Nov 23, 2009 at 07:42:06AM -0800, Bill Moseley wrote:
> >Still not following.   You are talking about Catalyst::View::TT?
>
> It appears that the latin1 -> htmlentities conversion is done
> by View:TT's htmlentity, e.g.:
>
>    [% FOREACH h IN cols %]<td>[% b.$h | html_entity %]</td>[% END %]
>
> This is perfectly OK, even if not strictly required. I thought
> it was something else doing that, but it isn't.

If you use the `html` filter instead of `html_entity`, it will
escape only the five characters that have to be.

> There is still apparently something wrong: there is absolutely
> no reason why a Perl Unicode string should be used, but I was
> unable to determine why it was created (upgraded) in the first
> place.

There is no reason why such a string should NOT be used either.
The meaning of the string doesn’t change. It’s an implementation
detail in perl whether the string has been upgraded or not.

The bug is that bytes::length is being used to get its length.

> The fact is that counting bytes from the Perl Unicode upgraded
> string is wrong when using ISO-8859-1.

Using bytes::length is ALWAYS wrong. No really, it’s ALWAYS
wrong. (See the long rant in the other mail I just sent for an
explanation.)

> Maybe Catalyst dropped any support for non UTF-8 charset. By
> doing that it also dropped any support for any charset having
> a bytesize different than the Perl Unicode upgraded string
> internal format, apparently.

It’s just plain a bug in Catalyst that it’s using bytes::length.

I had an IRC convo with Tomas Doran last night and explained the
problem to him. He knocked out some tests for the broken
behaviour. It should be all fixed in the next release, and then
you can upgrade and throw away that `before finalize_headers`
workaround.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>