[Catalyst] Escaping of "argument" of private path

Octavian Rasnita orasnita at gmail.com
Wed Mar 16 07:02:10 GMT 2011


From: "John M. Dlugosz" <wxju46gefd at snkmail.com>
> On 3/15/2011 4:56 AM, Octavian Rasnita orasnita-at-gmail.com 
> |Catalyst/Allow to home| wrote:
>>
>> uri_for() escapes only the chars which are not in the following list 
>> (from URI.pm):
>>
>> $reserved   = q(;/?:@&=+$,[]);
>> $mark       = q(-_.!~*'());                                    #'; emacs
>> $unreserved = "A-Za-z0-9\Q$mark\E";
>>
>> The char "&" is a valid char in the URI, so it should not be escaped.. 
>> With other words, the following url is OK:
>>
>> http://localhost/dir1/dir2/ham%20&%20eggs.jpg
>>
>> uri_for() generates the URI as it needs to be accessed on the server and 
>> not as it should be printed in an HTML page. In order to be printed 
>> correctly, the "&" char must be HTML-encoded, so the html TT filter must 
>> be used:
>>
>> <a href="[% c.uri_for('/path', 'eggs & ham.jpg', {a=1, b=2}).path_query | 
>> html%]">label</a>
>>
>> It will give:
>>
>> <a href="/path/eggs%20&amp;%20ham.jpg?a=1&amp;b=2">label</a>
>>
>
> In contrast, the 'uri' filter in TT "converting any characters outside of 
> the permitted URI character set (as defined by RFC 2396)" and that 
> includes |&|, |@|, |/|, |;|, |:|, |=|, |+|, |?| and |$|.
> The 'url' filter in TT is less aggressive, and does not include those.


Those chars are not permitted in query strings but they are permitted in 
URLS. The "?", "&", "=", "+", ";"  signs are used for separating the path 
and the query string, to delimit the query string parts, to represent a 
space char...
They can be also used in names of the files in path. For example, the 
following URL is valid:

http://localhost/static/a%20&%20@%20;%20$%20+%20=.txt

If you want, you can escape these chars everywhere, not only in the query 
strings, but why would you want to do this?

> The '&' is a "Reserved Character" according to §2.2 of RFC 2396.  That is 
> what the code sample you quoted notes: the set of reserved characters. 
> They may have specific meanings as delimiters within the overall URI, so 
> should be escaped.  Just skimming, I see that it's reserved within the 
> query component.


Yes, but uri_for() escapes them in the query components (where they need to 
be escaped).

For example:

[% file = 'a+b = c & $î @â'; a = 'a+b = c & $î @â'; b= 'a+b = c & $î @â' %]
<a href="[% c.uri_for('/path', file, {a=a, b=b}).path_query %]">label</a>

will display:

<a 
href="/path/a+b%20=%20c%20&%20$%C3%AE%20@%C3%A2?a=a%2Bb+%3D+c+%26+%24%C3%AE+%40%C3%A2&b=a%2Bb+%3D+c+%26+%24%C3%AE+%40%C3%A2">label</a>

Note that I didn't html-encoded the URL for beeing easier to see the result.
As you may see, the reserved chars are escaped by uri_for() only where they 
need to be escaped.

And of course, if you need to print this URL in an HTML document, you can 
add the TT html filter and the "&" chars will be displayed as &amp;.


> Anyway, using the TT 'uri' filter on the dynamic path component means I 
> don't have to use the html filter also!


Why would you like to need to escape every path component by using the TT 
uri filter for more times and escape the reserved chars even where they can 
be used as they are, instead of using the html filter once?

If you want, you can uri-escape even the [a-zA-Z0-9] chars, but why would 
you want to escape chars where they don't need to be escaped? :-)

Octavian




More information about the Catalyst mailing list