[Catalyst] Re: decoding in core

Mon Feb 23 15:19:46 GMT 2009

Zbigniew Lukasiak schrieb:
> Hmm - in my understanding it only changes literals in the code ( $var
> = 'ą' ).  So I looked into the pod and it says:
>
>     Bytes in the source text that have their high-bit set will be
> treated as being part of a literal
>     UTF-8 character.  This includes most literals such as identifier
> names, string constants, and con-
>     stant regular expression patterns.
>   
Aaaaah SORRY! In my confusion I've confused it again...
So if I get it right, "use utf8" means you can do stuff like $s ~= 
s/a/ä/; (as the plain ä in the source will be treated as one character 
and not two octets), while the magical utf8-flag for $s tells perl, that 
the ä in the scalar really is an ä and not two strange octets.
Am I right or am I completely lost again?
> Hmm - maybe I'll add UTF-8 handling in InstantCRUD.  I am waiting for
> good sentences showing off the national characters.
Does it have to be a complete sentence? My favourite test-string is 
something like
äöüÄÖÜß"'+ (UTF-8)
C3 A4 C3 B6 C3 BC C3 84 C3 96 C3 9C C3 9F 22 27 2B (Hex)
If I can put this string into some html-form, post/get it, process it, 
save to and read from db, output it to browser _and_ still have exactly 
10 characters, the application _might_ work as it should.
The Umlauts and the Eszett are a pain of unicode, the " and ' are 
fun-with-html and escaping and the + ... well, URI-encoding, you know...

For even more fun, one should do a regex in the application using utf8 
(give me all those äÄs) and select it from the DB, first with "blahfield 
LIKE 'ä'", maybe "upper(blahfield) LIKE upper('ä')" and finally an 
"ORDER BY blahfield", where blahfield should contain one row starting 
with "a", one with "ä" and one with "b" and the output should have 
exactly this order and _not_ "a,b,ä" (hint hint: utf9 treated as ascii 
or latin1).

Greets and regards,
Tom Weber