[Catalyst] Re: Encoding problem - ?fastcgi?

Richard Robinson catalyst at beulah.qualmograph.org.uk
Tue Oct 9 14:11:28 GMT 2007


On Mon, Oct 08, 2007 at 09:40:52PM +0100, Richard Robinson wrote:
> On Mon, Oct 08, 2007 at 02:48:48PM -0500, Jonathan Rockway wrote:
> > Richard Robinson wrote:
> > > On Mon, Oct 08, 2007 at 02:07:18PM +0200, A. Pagaltzis wrote:
> > >>
> > >> did you ever get this sorted? Quoting in full since you seem to
> > >> have been warnocked. I have no idea what to suggest but am
> > >> curious about this issue as a matter of personal interest.
> > > ...
> > > No, I haven't got any further, I'm kind of stuck on it. The workaround makes
> > > it go away for now & there are other things I need to be sorting out too, so
> > > I'm just kind of waiting for an idea to appear from somewhere ... 
> > 
> > Can someone *please* distill a test case for this? 
> > 
> > I can guess why this is happening, but I'd like to be sure.  My guess is
> > that utf8-flagged characters are in $c->res somewhere outside of (the
> > utf-8 but not flagged as such) $c->res->body.  When the headers and body
> > are concatenated somewhere down the line, the encoded utf8 octets get
> > upgraded to wide characters as latin-1, and you have your double encoding.
> 
> <bangs head on wall>
> like $c->res->content_type('application/pdf'), maybe ?

Yes. I think you have it.

sub test_encoding : Path('/test_encoding')
{  my ($self, $c) = @_;
   my $header = 'application/octet-stream';
   if ($c->req->param('header_utf8'))
   {  $header = Encode::decode_utf8($header);
   } else
   { $header = Encode::encode_utf8($header);
   }
   $body = "\xc8\xe8";  # upper+lower, latin1 E-grave, latin2 C-caron, ...
   $c->res->content_type($header);
   $c->res->body( $body );
}

test this locally -
$ for n in 0 1 ; do
  lwp-request "http://localhost:3000/test_encoding?header_utf8=$n">encode.$n
  done ; file encode.* ; ls -gG encode.*
encode.0: ISO-8859 text, with no line terminators
encode.1: ISO-8859 text, with no line terminators
-rw-r--r-- 1 2 2007-10-09 12:32 encode.0
-rw-r--r-- 1 2 2007-10-09 12:32 encode.1

and through fastcgi/apache. 
$ for n in 0 1 ; do
  lwp-request "http://livetunebook.qualmograph.org.uk/test_encoding?header_utf8=$n">encode.$n
  done ; file encode.* ; ls -gG encode.*
encode.0: ISO-8859 text, with no line terminators
encode.1: UTF-8 Unicode text, with no line terminators
-rw-r--r--  1 2 2007-10-09 13:43 encode.0
-rw-r--r--  1 2 2007-10-09 13:43 encode.1

I notice the utf8 file is truncated to the original length - it contains the
2 bytes of the first character.


... and suddenly I can't test this any more, slicehost appears to have gone
missing ??? Pity, I'd like to see whether that does the trick for the real
data. Ah well.

Is that any help ?

-- 
Richard Robinson
"The whole plan hinged upon the natural curiosity of potatoes" - S. Lem




More information about the Catalyst mailing list