[Catalyst] Avoiding UTF8 in Catalyst

Marc SCHAEFER schaefer at alphanet.ch
Mon Nov 23 16:18:02 GMT 2009


On Mon, Nov 23, 2009 at 07:42:06AM -0800, Bill Moseley wrote:
> Still not following.   You are talking about Catalyst::View::TT?

It appears that the latin1 -> htmlentities conversion is done by
View:TT's htmlentity, e.g.:

    [% FOREACH h IN cols %]<td>[% b.$h | html_entity %]</td>[% END %]

This is perfectly OK, even if not strictly required.  I thought it was
something else doing that, but it isn't.

> BTW -- when looking at C::V::TT I see where you got that DEFAULT_ENCODING
> from -- it's documented in C::V::TT.

The simple fact that html_entity above changes é (iso-8859-1) in &eacute;
means that something must have understood I am using iso-8859-1, which
is good. But you seem to be right:

> As far as I know there's no such setting in Template Toolkit.  There's
> "ENCODING" to specify the encoding of your templates.

I am using:

   package MyApp::View::TT;

   use strict;
   use base 'Catalyst::View::TT';

   __PACKAGE__->config(TEMPLATE_EXTENSION => '.tt',
                       FILTERS => { 'latex' => \&latex },
                       DEFAULT_ENCODING   => 'iso-8859-1',
                       WRAPPER => 'wrapper.tt');

You are however right that removing the DEFAULT_ENCODING above
doesn't change anything. Replacing it by ENCODING => 'utf-8'
creates a charset conversion bug (which is expected). Replacing with
ENCODING => 'iso-8859-1' doesn't change anything. So I can safely
assume that as usually expected, iso-8859-1 is the default.  I now
removed this specification altogether.

> If your templates are 8859-1 with 8 bit characters my suggestion would be to
> convert them to utf-8 and set ENCODING to utf8 for the templates, and move
> toward utf8 everywhere.    Make sure you use the plugin to decode and
> encode.

Again, utf8 is out of the question here: be it in the source file, the
database, or the output. UTF-8 is unacceptable in our environment.

My problem (Catalyst sending iso-8859-1 data to the browser, but having
a wrong Content-Length: as if counting the bytes from the UTF-8
equivalent (or Perl Unicode upgraded string as mentionned in a separate
mail by Aristotle Pagaltzis)) was solved by adding the following to MyApp.pm:

before 'finalize_headers'
   => sub {
         my $c = shift;

         if ($c->response) {
            my $s = $c->response->body;
            utf8::downgrade($s);
            $c->response->body($s);
         }
      };   

There is still apparently something wrong: there is absolutely no reason
why a Perl Unicode string should be used, but I was unable to determine
why it was created (upgraded) in the first place.

The fact is that counting bytes from the Perl Unicode upgraded string is
wrong when using ISO-8859-1.

Maybe Catalyst dropped any support for non UTF-8 charset. By doing that
it also dropped any support for any charset having a bytesize different
than the Perl Unicode upgraded string internal format, apparently.

But I am no expert on this.





More information about the Catalyst mailing list