[Catalyst] Problem with Catalyst::Plugin::I18N using UTF-8
Knut-Olav Hoven
hovenko at linpro.no
Sat Dec 22 00:13:50 GMT 2007
On Friday 21 December 2007 22:35:19 Ash Berlin wrote:
> Right, I think there is some confusion on your part as to what is the
> proper way of handling unicode in perl.
Yes, I have always found UTF-8 confusing ;)
>
> (The basic problem is that "perl's magic internal representation" just
> happens to look exactly like UTF-8 plus a magic flag. Longer
> description below)
Ok, I find this UTF8 flag a little more confusing than probably needed, but
I'll try keep that in mind.
>
>
> First off, you need to understand the difference between characters
> and bytes/octets
>
> "æøå" is a character string
> "\303\246\303\270\303\245" is a utf8 byte sequence != a string
>
> "\303\246\303\270\303\245" + UTF8 flag = "æøå" perl string
From what I think I know, "\303\246\303\270\303\245" is a string in unicode
representation, but becomes garbage when trying to display it as something
else (like ISO-8859-1).
>
> From perldoc perlunicode
>
> [...]
>
> To relate this to your problem, you are getting some of your data
> double encoded because the data (from the perl module you are using to
> access your LDAP server) is returning a byte sequence that perl
> doesn't know is supposed to be UTF8.
>
> The answer is to do Encode::decode("utf8", $utf8_byte_sequence)
> on all the data coming back from your LDAP server (or to find the
> right option to make the module you are using do it).
I'm not having any problems with my LDAP server or any data that is sent to or
from it... not yet at least ;)
>
> Any of this make any sense?
Yes, some.
...but I'll wait a while before using the Catalyst::Plugin::Unicode, since I
solved my problems by changing the "Decode" parameter to 0 and avoid using
the utf8::encode method in uri_for. The value I pass to uri_for is coming
from the I18N plugin, which already is UTF-8.
According to the utf8 manpage the utf8 pragma should not be used unless you
are writing your source code in utf8. I write my source code in utf8, but as
I understand you should not use utf8::encode unless you write characters
like "æøå" in your code. My "æøå" comes from other sources, such as the
browser or LDAP, therefore I should not run utf8::encode on those variables.
I made four tests, split in two groups. The first group gets its input from a
variable in code, while the second group gets input from a parameter on the
command line. The first tests in each group just prints it, while the second
tests runs utf8::encode on the variable.
For what I understand from this, utf8::encode brings nothing useful in
Catalyst::uri_for, only pain.
# From inside code
perl -e 'use utf8; my $a = "test æøå"; print $a;'
test ��
perl -e 'use utf8; my $a = "test æøå"; utf8::encode $a; print $a;'
test æøå
# From arguments on command line
perl -e 'use utf8; my $a = shift; print $a;' "æøå"
æøå
perl -e 'use utf8; my $a = shift; utf8::encode $a; print $a;' "test æøå"
test æøå
>
>
> PS. It seems that even Apple has problems with UTF8. In writing this
> email I saved it in my drafts folder. When I came back to edit it
> again, the non-ascii characters got fluffed up. Fun eh?
>
>
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.rawmode.org/
> Dev site: http://dev.catalyst.perl.org/
--
Knut-Olav Hoven
Systemutvikler mob: +47 986 71 700
Linpro AS http://www.linpro.no/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.scsys.co.uk/pipermail/catalyst/attachments/20071222/8f8a22d9/attachment.pgp
More information about the Catalyst
mailing list