[Catalyst] My Life with UTF-8
miyagawa at gmail.com
Sun Aug 13 06:39:21 CEST 2006
On 8/12/06, Jonathan Rockway <jon at jrock.us> wrote:
> The first unicode breakage I had was when I added Japanese-style dates
> as timestamps on the pages. (Japanese day-name character in
> parenthesis.) What's weird was, adding this to the page worked fine --
> but it broke OTHER unicode characters on the page (sourced from a file
> or file attribute). Adding "use utf8" to the top of my source file
> fixed my problems, on Linux anyway. (Never tried on OpenBSD.)
Sounds like a traditional "Unicode string + UTF-8 bytes = BOOM"
problem. To solve that, you should handle everything in Unicode string
(utf-8 flagged), or everything in utf-8 bytes (utf::encode($str)).
Mixing the two breaks the other one.
But it's sometimes hard, since some CPAN modules don't care about
Unicode string and just return strings in utf-8 bytes.
> The next problem I noticed was that C::V::TT::ForceUTF8 broke TT's "uri"
> filter. According to the HTML validator, URIs can't be unicode, so you
> have to encode the URI to UTF-8. TT's URI filter was documented to do
> this, but it translated anything with the 8th bit set to nothing,
Yeah, Template::Stash::ForceUTF8 and Template::Provider::Encoding is
made just to fix that issue. Interesting to hear that TT uri filter
gets borked by that. Any working code that shows the breakage?
BTW we use Stash::ForceUTF8 and Provider::Encoding on our production
boxes and they work fine.
> Any way I can tell perl, "trust me, everything is already UTF-8... don't
> #^$ing touch it."?
encoding::warnings might be for your help. Not sure if it works
actually, but the documentation would be a great help at least.
More information about the Catalyst