[Catalyst] create search engine friendly uri from string
robin at berjon.com
Tue Dec 16 11:46:58 GMT 2008
On Dec 16, 2008, at 12:20 , <onken at houseofdesign.de> <onken at houseofdesign.de
> On Tue, 16 Dec 2008 11:51:28 +0100, Robin Berjon <robin at berjon.com>
>> Before putting that into a module though you might want to think
>> what should happen to characters outside the [a-z0-9] range as \W
>> match differently based on locale. I'm not sure what the recommended
>> behaviour is for such cases.
> That's what I'm thinking about right now. I couldn't find a
> reference which
> says that \W matches differently based on locale.
A "\w" matches a single alphanumeric character (an alphabetic
character, or a decimal digit) or "_", not a whole word. Use
match a string of Perl-identifier characters (which isn't the
matching an English word). If "use locale" is in effect, the
alphabetic characters generated by "\w" is taken from the
> Ptyhon can convert an utf8 string to an ascii string and replaces
> characters like "ä" with the most equivalent character "a". Is there
> a thing for perl?
There's a host of modules on CPAN that do things like that, but I
don't know if one is accepted as the better way to go. The problem is
that if you want to cover all your bases it can become a rather
extensive problem. For instance you might want to convert "é" to "e",
but do you want to map "北京" to "beijing"?
The simple solution is probably to have one option that encodes to IRI
friendly, and another to URI friendly, and let people who want
something more complicated roll up their own. See http://annevankesteren.nl/2004/08/uri-design
for some thoughts related to this, or http://www.w3.org/International/iri-edit/draft-duerst-iri-bis.html
But that doesn't address the locale issue. For that be sure to toss in
a no locale (which is lexical) or to define your own character classes
instead of \w, \s, and friends.
Robin Berjon - http://berjon.com/
Feel like hiring me? Go to http://robineko.com/
More information about the Catalyst