[Catalyst] My Life with UTF-8
    Tatsuhiko Miyagawa 
    miyagawa at gmail.com
       
    Sun Aug 13 06:39:21 CEST 2006
    
    
  
Konnichiwa,
On 8/12/06, Jonathan Rockway <jon at jrock.us> wrote:
> The first unicode breakage I had was when I added Japanese-style dates
> as timestamps on the pages.  (Japanese day-name character in
> parenthesis.) What's weird was, adding this to the page worked fine --
> but it broke OTHER unicode characters on the page (sourced from a file
> or file attribute).  Adding "use utf8" to the top of my source file
> fixed my problems, on Linux anyway.  (Never tried on OpenBSD.)
Sounds like a traditional "Unicode string + UTF-8 bytes = BOOM"
problem. To solve that, you should handle everything in Unicode string
(utf-8 flagged), or everything in utf-8 bytes (utf::encode($str)).
Mixing the two breaks the other one.
But it's sometimes hard, since some CPAN modules don't care about
Unicode string and just return strings in utf-8 bytes.
> The next problem I noticed was that C::V::TT::ForceUTF8 broke TT's "uri"
> filter.  According to the HTML validator, URIs can't be unicode, so you
> have to encode the URI to UTF-8.  TT's URI filter was documented to do
> this, but it translated anything with the 8th bit set to nothing,
Yeah, Template::Stash::ForceUTF8 and Template::Provider::Encoding is
made just to fix that issue. Interesting to hear that TT uri filter
gets borked by that. Any working code that shows the breakage?
BTW we use Stash::ForceUTF8 and Provider::Encoding on our production
boxes and they work fine.
> Any way I can tell perl, "trust me, everything is already UTF-8... don't
> #^$ing touch it."?
encoding::warnings might be for your help. Not sure if it works
actually, but the documentation would be a great help at least.
http://search.cpan.org/~audreyt/encoding-warnings-0.10/
-- 
Tatsuhiko Miyagawa
    
    
More information about the Catalyst
mailing list