[Catalyst] Problem with Catalyst::Plugin::I18N using UTF-8

Knut-Olav Hoven hovenko at linpro.no
Sat Dec 22 00:13:50 GMT 2007


On Friday 21 December 2007 22:35:19 Ash Berlin wrote:
> Right, I think there is some confusion on your part as to what is the
> proper way of handling unicode in perl.

Yes, I have always found UTF-8 confusing ;)

>
> (The basic problem is that "perl's magic internal representation" just
> happens to look exactly like UTF-8 plus a magic flag. Longer
> description below)

Ok, I find this UTF8 flag a little more confusing than probably needed, but 
I'll try keep that in mind.

>
>
> First off, you need to understand the difference between characters
> and bytes/octets
>
> "æøå" is a character string
> "\303\246\303\270\303\245" is a utf8 byte sequence != a string
>
> "\303\246\303\270\303\245" + UTF8 flag = "æøå" perl string

From what I think I know, "\303\246\303\270\303\245" is a string in unicode 
representation, but becomes garbage when trying to display it as something 
else (like ISO-8859-1).

>
>  From perldoc perlunicode
>
> [...]
>
> To relate this to your problem, you are getting some of your data
> double encoded because the data (from the perl module you are using to
> access your LDAP server) is returning a byte sequence that perl
> doesn't know is supposed to be UTF8.
>
> The answer is to do Encode::decode("utf8", $utf8_byte_sequence)
>   on all the data coming back from your LDAP server (or to find the
> right option to make the module you are using do it).

I'm not having any problems with my LDAP server or any data that is sent to or 
from it... not yet at least ;)

>
> Any of this make any sense?

Yes, some.

...but I'll wait a while before using the Catalyst::Plugin::Unicode, since I 
solved my problems by changing the "Decode" parameter to 0 and avoid using 
the utf8::encode method in uri_for. The value I pass to uri_for is coming 
from the I18N plugin, which already is UTF-8.

According to the utf8 manpage the utf8 pragma should not be used unless you 
are writing your source code in utf8. I write my source code in utf8, but as 
I understand you should not use utf8::encode unless you write characters 
like "æøå" in your code. My "æøå" comes from other sources, such as the 
browser or LDAP, therefore I should not run utf8::encode on those variables.


I made four tests, split in two groups. The first group gets its input from a 
variable in code, while the second group gets input from a parameter on the 
command line. The first tests in each group just prints it, while the second 
tests runs utf8::encode on the variable.

For what I understand from this, utf8::encode brings nothing useful in 
Catalyst::uri_for, only pain.

# From inside code
perl -e 'use utf8; my $a = "test æøå"; print $a;'
test ��

perl -e 'use utf8; my $a = "test æøå"; utf8::encode $a; print $a;'
test æøå

# From arguments on command line
perl -e 'use utf8; my $a = shift; print $a;' "æøå"
æøå

perl -e 'use utf8; my $a = shift; utf8::encode $a; print $a;' "test æøå"
test æøå


>
>
> PS. It seems that even Apple has problems with UTF8. In writing this
> email I saved it in my drafts folder. When I came back to edit it
> again, the non-ascii characters got fluffed up. Fun eh?
>
>
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.rawmode.org/
> Dev site: http://dev.catalyst.perl.org/



-- 
Knut-Olav Hoven
Systemutvikler               mob: +47 986 71 700
Linpro AS                    http://www.linpro.no/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.scsys.co.uk/pipermail/catalyst/attachments/20071222/8f8a22d9/attachment.pgp


More information about the Catalyst mailing list