[Catalyst] create search engine friendly uri from string

Robin Berjon robin at berjon.com
Tue Dec 16 11:46:58 GMT 2008


On Dec 16, 2008, at 12:20 , <onken at houseofdesign.de> <onken at houseofdesign.de 
 > wrote:
> On Tue, 16 Dec 2008 11:51:28 +0100, Robin Berjon <robin at berjon.com>  
> wrote:
>> Before putting that into a module though you might want to think  
>> about
>> what should happen to characters outside the [a-z0-9] range as \W  
>> will
>> match differently based on locale. I'm not sure what the recommended
>> behaviour is for such cases.
>
> That's what I'm thinking about right now. I couldn't find a  
> reference which
> says that \W matches differently based on locale.

 From perlre

        A "\w" matches a single alphanumeric character (an alphabetic
        character, or a decimal digit) or "_", not a whole word.  Use  
"\w+" to
        match a string of Perl-identifier characters (which isn't the  
same as
        matching an English word).  If "use locale" is in effect, the  
list of
        alphabetic characters generated by "\w" is taken from the  
current
        locale.

> Ptyhon can convert an utf8 string to an ascii string and replaces
> characters like "ä" with the most equivalent character "a". Is there  
> such
> a thing for perl?

There's a host of modules on CPAN that do things like that, but I  
don't know if one is accepted as the better way to go. The problem is  
that if you want to cover all your bases it can become a rather  
extensive problem. For instance you might want to convert "é" to "e",  
but do you want to map "北京" to "beijing"?

The simple solution is probably to have one option that encodes to IRI  
friendly, and another to URI friendly, and let people who want  
something more complicated roll up their own. See http://annevankesteren.nl/2004/08/uri-design 
  for some thoughts related to this, or http://www.w3.org/International/iri-edit/draft-duerst-iri-bis.html 
.

But that doesn't address the locale issue. For that be sure to toss in  
a no locale (which is lexical) or to define your own character classes  
instead of \w, \s, and friends.

-- 
Robin Berjon - http://berjon.com/
     Feel like hiring me? Go to http://robineko.com/








More information about the Catalyst mailing list