[Catalyst] mod_perl converts latin1 to utf8 !?

Bjørn-Helge Mevik bhx6 at mevik.net
Fri Dec 26 20:03:34 GMT 2008


Dami Laurent (PJ) wrote:

> Mod_perl probably does nothing to your encoding, but Apache might interfere.

Hm.  That's a thought...

> a) did you configure your Apache as follows ?
>
>    AddDefaultCharset iso-8859-1

I've now tried that, but it had no effect on the encoding.

> b) try to look a the HTTP traffic (using Firebug or Fiddler2), to
> see if there are some other charset=... headers generated by some
> component in your chain.

I should probably start using tools like that -- sofar I've only used
telnet for looking at the HTTP traffic. :-)  Anyway, there is only one
"Content-Type: text/html; charset=" HTTP header, and it conforms to
the setting in the process() method of Catalyst::View::TT.

(I've also tried to add a  http-equiv="Content-Type" meta tag to the
<head>, but to no avail.)
 
> c) try some static latin1 pages in Apache htdocs to see if they are
> rendered correctly.

I've tried static latin1 and utf8 pages, and they are rendered
correctly: Apache does not change the encoding of the characters.  If
the page contains a http-equiv="Content-Type" meta tag, it
is respected, otherwise Apache looks at the characters and sets the
HTTP content-type header correctly.

Further, I wrote a small module with a handler() and ran it under
mod_perl (outside the Catalyst application):

===============
package Enctest;
use strict;
use warnings;
use Encode;
#use utf8;

sub handler() {
    my $r = shift;
    my $A = "<p>æøå</p>";
    $r->content_type('text/html; charset=iso8859-1');
    #$r->content_type('text/html; charset=utf-8');
    $r->print("<html>$A");
    $r->print(encode('ISO-8859-1', $A));
    $r->print(encode('UTF-8', $A) . "</html>");
    0;
}

1;
=============

I tested all combinations of
- Storing the file as latin1 vs. utf8
- With and without "use utf8;"
- charset iso8859-1 vs. utf-8

In all combinations, Apache+mod_perl faithfully reproduced the bytes
that, up to my understanding, Perl should output in the different
print()s.

>From this it would seem that Apache and mod_perl do not recode the
characters.  Perhaps it could be something that TT does when run under
mod_perl (as this does not happen under the development server)?

-- 
Bjørn-Helge Mevik



More information about the Catalyst mailing list