[Catalyst] Avoiding UTF8 in Catalyst

Peter Edwards peter at dragonstaff.com
Mon Nov 23 17:42:13 GMT 2009

> The fact is that counting bytes from the Perl Unicode upgraded string is
> wrong when using ISO-8859-1.
> Maybe Catalyst dropped any support for non UTF-8 charset. By doing that
> it also dropped any support for any charset having a bytesize different
> than the Perl Unicode upgraded string internal format, apparently.
> But I am no expert on this.
I would recommend using utf-8 throughout, even if you think you'll never
need it.
The reason is you can accidentally send what appear to be correct bytes even
though they are not and if you are using a English browser you will never
realise that pure chance is saving you.
You sail along sticking it in a database, in files, in templates and it
works... until one day it doesn't.
Then you are in big trouble with data that might-be-latin-1 or
might-be-utf-8 or might-be-double-encoded.
I watched a colleague spend 3 whole months fixing an internal framework like
that. Proving his fixes worked was very difficult and the historic data in
Oracle, well, there was no reliable way to unbork it.
Many of us have been through a lifetime of pain with latin-1 encoding.
Unless you are dead set on it, it's much easier to use utf-8 throughout and
add a few simple utf-8 unit tests at the input/output boundaries of your
system components.

It's not so hard, add  use open ':utf8';  to your code at the top or use
binmode $fh, ':utf8';  on open file handles.
Use the default utf-8 encoding in Template Toolkit.
When you want to print a variable do   use Encode; print encode_utf8($foo);.

Regards, Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20091123/8ab65=

More information about the Catalyst mailing list