[Catalyst] Re: Avoiding UTF8 in Catalyst

Carl Johnstone catalyst at fadetoblack.me.uk
Mon Nov 23 17:38:29 GMT 2009


Aristotle Pagaltzis wrote:
> But there’s no room for “likelies” here: that’s programming by
> coincidence.

The "likely" was correct.

When using UTF-8 whether the length of the string is different in bytes and 
characters depends entirely on what the contents of the string are. Given a 
particular string I could tell you exactly whether they should match, but in 
the general case all I can say is that it's *likely* to be different.

In any case that's an argument about English :-)

> Either you want it or you don’t, and in this case
> you do. But bytes::length doesn’t do that.
>
> Please plese don’t make statements like “not in this case”
> without knowing what the thing you are talking about does, i.e.
> in this case bytes::length, does. There are enough misconceptions
> about Unicode in Perl already.

As far as the usage of bytes::length. Yes I agree with you that the code is 
wrong as it's taking the byte length of perl's internal representation - 
which happens to be utf-8 and whilst correct in that case, isn't for any 
other character set and shouldn't be relied upon.

You *do* have to take a byte length of the string in the destination 
character set though, so I'm interested in what the correct solution would 
be.

Carl




More information about the Catalyst mailing list