[Catalyst] Re: decoding in core

Octavian Râşniţă orasnita at gmail.com
Mon Feb 23 18:08:49 GMT 2009


From: "Bill Moseley" <moseley at hank.org>
n Mon, Feb 23, 2009 at 06:45:40PM +0200, Octavian Râşniţă wrote:
> I understand that there are reasons for not transforming all the
> encodings to UTF-8 in core, even though it seems to be not very
> complicated, because maybe there are some tables that contain ISO-8859-2
> chars and other tables that contain ISO-8859-1 chars, and when the data
> need to be saved, it should keep its original encoding.

Don't think about transforming encodings to UTF-8.

In the vast majority of cases people expect to work with characters,
and that's what Perl works with internally.  UTF-8 is an encoding, not
characters.

The HTTP request is octets.  The HTTP request specifies what encoding
those octets represent and it's that encoding that is used to decode
the octets into characters.  The fact that Perl uses UTF-8 internally
is best ignored -- it's just characters inside Perl once decoded.

Conceptually it's not that much different than a request with
"Content-Encoding: gzip" -- before using the request body parameters
the gzipped octets must obviously be decoded.  Likewise, the body must
be url-decoded into separate parameters.  And again, the resulting
octets must be decoded into characters if the parameters are to be
used as character.  That last step has often been ignored.

Then when sending a response of (abstract) characters that are inside
Perl they must first be encoded into octets.

Those things should be handled at the edge of the application, and
that would be in the Engine (or the code the Engine uses).

Yes, the same thing has to happen with templates, the database, and
all external data sources.  Those are separate issues.  HTTP provides
a standard way to determine how to encode and decode.

Ok, but wouldn't be possible to need to specify this encoding only once in a 
single place?
Or better said, if the app uses C::P::Unicode module, it could consider as a 
default that the templates, controllers and other parts of the app use 
UTF-8, and use a different encoding for one or some of them only if the 
encoding is specified explicitly.

Octavian






More information about the Catalyst mailing list