[Catalyst] Url Encoded UTF8 parameters

Bill Moseley moseley at hank.org
Sun Aug 2 05:36:34 GMT 2015


On Sat, Aug 1, 2015 at 6:31 AM, Stefan <maillist at s.profanter.me> wrote:

> Hi,
>
> if a URL parameter contains a Unicode character (e.g.
> www.example.com/?param=%D6lso%DF which stands for param=Ölsoße), the
> parameter is not correctly parsed as Unicode.
>

4.       This outputs for the example url: localhost:3000/?param=%D6lso%DF:
>
> [debug] $VAR1 = {
>
>           'param' => "\x{fffd}lso\x{fffd}e"
>
>         };
>
> [debug] $VAR1 = '\x{d6}lso\x{df}e';
>
>
>
>
>
> As you can see, the first output only contains one equal character:
> \x{fffd} which is obviously not the same as it should be: \x{d6}lso\x{df}e
>

\x{fffd} is the unicode replacement character used by Encode to replace
invalid UTF-8 sequences you are passing in.

Try this instead in your browser:

?param=Ölsoße


And then print $c->request->parameters->{param} -- and if you check
Encode::is_utf8( $param ) it should be true, too, indicating the param was
decoded correctly into characters.

Or if you prefer:

perl -le 'use URI::Escape; print uri_escape( "Ölsoße" )'
%C3%96lso%C3%9Fe


so,

?param=%C3%96lso%C3%9Fe


but most likely the browser will turn it back into ?param=Ölsoße


If you really want to say you are using utf8 constant strings (i.e. "use
utf8;"):

$ perl -le 'use URI::Escape; use Encode; use utf8; use Encode; print
uri_escape( encode_utf8( "Ölsoße" ) )'
%C3%96lso%C3%9Fe

or

$ perl -le 'use URI::Escape; use Encode; use utf8; use Encode; print
uri_escape_utf8( "Ölsoße" )'
%C3%96lso%C3%9Fe


All the same thing.


-- 
Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.scsys.co.uk/pipermail/catalyst/attachments/20150801/53d10e99/attachment.htm>


More information about the Catalyst mailing list