[Catalyst] Backlog for proposed changes in next Catalyst release

Bill Moseley moseley at hank.org
Mon Mar 11 15:48:32 GMT 2013


On Mon, Mar 11, 2013 at 4:53 AM, Will Crawford
<billcrawford1970 at gmail.com>wrote:

> While it's not Catalyst's fault, I've found over the years that
> interacting with underlying libraries, databases and legacy systems is
> generally easier when I *don't* try to force anything. I have custom code
> in place to deal with know sources of inconsistent encodings (check to see
> if it's valid UTF8, look up the remainder in a table painstakingly
> assembled over a period of time that catches a few odd MacRoman characters
> that show up in some of our contributors' data, fall back to latin1 or
> cp1252 for the remainder, leave anything else as \xNN). Everywhere else,
> UTF8 can be passed through quite transparently, so I don't really see the
> point of adding extra decoding and encoding all over the place to switch
> from utf8, to some internal wide character encoding, then back to utf8
> again for output. One of the positive features of UTF8 has always been th=
at
> code that doesn't need to identify any of those fancy accented characters
> can just treat it the same as ASCII, Latin-$WHATEVER or cp1252 without any
> overhead. Overall I can't see the point of forcing everything to be
> converted multiple times ...
>
>
I think we can all agree that historically encoding has been confusing,
misunderstood, and frequently ignored.   And very often just done plain
wrong.

I suspect since this is currently a plugin that it's often ignored,
especially by newer developers.  That means "out of the box" Catalyst, as a
web framework, for its typical use, is broken.


One of the typical uses for a Catalyst application is building a web app
that outputs character data.  This character data must be encoded when sent
over the wire.   Likewise, request data that is character data must be
decoded.  Doing these as close to the "edge" of the application as possible
is the best approach.  That's what the plugin does.

As t0m says, if you are ignoring encoding your app is broken.  Sure, it may
not seem so.  Sure, you can ignore those "fancy accented characters" and if
you app only works in an something like ASCII never notice -- it's just
like before Unicode support was added to Perl.   And you still should set a
charset on the content-type when you send the response, so what are you
going to set it to?

Plus, once you do get some of those fancy characters (used by billions of
people) into your app then all those length() and every other thing that
works with characters (hey, this is Perl) will be broken.


My wild guess from your description above is you are not handling encoding
correctly.  But, in the real world you get character data thrown at you
that is broken in some way.  Perhaps your input is so broken you have to do
what you described.  (Still, I think the correct approach is to decode()
with a useful CHECK value.)   If you are "passing through" UTF8 undecoded
then unless you not touching that input (as character data) then that's
broken.

You say it's best not to force anything, which I assume you mean force as
some encoding.  If you have character input then by nature it's encoded.
 You have to know what the encoding is, and decode it as such and be
prepared for bad data.  You wouldn't ignore it if it was base64 or gzipped,
right?  Those are not character encodings, but it's essentially the same
issue.


I have never considered any performance aspect of this.  It never shows up
when we profile "slow" responses.  Plus, it's never been an optional
operation.   We manipulate characters and we exchange data as bytes.  You
have to convert between those.


The plugin should be core to Catalyst.   It think it's pretty safe to add
it if it only encodes if the utf8 flag is set on the body -- that should
prevent double-encodings.   And having a config option to disable is easy.
 And if the plugin is found on the app issue a warning.   It's possible
that someone has their own modified version of the plugin using the same
name.




-- =

Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20130311/e4ab0=
821/attachment.htm


More information about the Catalyst mailing list