[Catalyst] Re: decoding in core

Neo [GC] neo at gothic-chat.de
Mon Feb 23 13:58:13 GMT 2009


Zbigniew Lukasiak schrieb:
> Some more things to consider.
>
> - 'use utf8' in the code generated by the helpers?
>   
Reasonable, but only if documentet. It took weeks for us until we 
learned, that this changes _nothing_ but the behaviour of several 
perl-functions like regexp, sort aso.
> - ENCODING: UTF-8 for the TT view helper?
>
> Maybe a global config option to choose the byte or character semantics?
>
> But with the DB it becomes a bit more complex - because BLOB columns
> probably need to use byte sematic.
>   
Uhm, of course, as BLOB is Binary and CLOB is Character. ;) This is even 
more complex, as the databases have different treating for this 
datatypes and some of Perls DBI-drivers are somewhat broken when it goes 
to unicode (according to our perl-saves-our-souls-guru).
UTF-8 is ok in Perl itself (not easy, not coherent, but ok); but in 
combination of many modules (and as far as I learned, Perl is all about 
reusing modules) it is _hell_. Try to read UTF-8 from HTTP-request, 
store in database, select with correct order, write to XLS, convert to 
CSV, reimport it into the DB and output it to the browser, all with 
different subs in the same controller... and you know, what I mean.
Even our most euphoric Perl-gurus don't have any clue how to handle 
UTF-8 from the beginning to the end without hour-long trial&error in 
their programs (and remember - we Germans do only have those bloody 
Umlauts - try to imagine this in China >_<).

Maybe the best thing for all average-and-below users would be a _really_ 
good tutorial about Catalyst+UTF-8. What to do, what not to do. How to 
read UTF-8 from HTTP-request / uploaded file / local file / database, 
how to write it to client / downloadable file / local file / database. 
What catalystish variable is UTF-8-encoded when and why. How to 
determine what encoding a given scalar has and how to 
encode/decode/whatevercode it to a bloody nice scalar with shiny UTF-8 
chars in it.
Short: -- Umlauts with Catalyst for dummies --



(sorry for sounding so emotional.... afaik our company burned man-weeks 
on solving minor encoding-bugs :-/ every tutorial we found was like "you 
can do it so or so or another way 'round the house, so it's perfect and 
if you don't understand is, you're retard and should use 7bit-ASCII"... 
while lately even a colleague sounds like this - as he is enlinghtened 
by CPAN literature like "UTF-8 vs. utf8 vs. UTF8" ;)).


Greets and regards,
Tom Weber



More information about the Catalyst mailing list