[Catalyst] Url Encoded UTF8 parameters

John Napiorkowski jjn1056 at yahoo.com
Mon Aug 3 18:41:08 GMT 2015

I'd be interesting in having some sort of flag on request, that indicated if the incoming query was bad.  I can't do a die here for legacy reasons.

     On Sunday, August 2, 2015 9:39 AM, Bill Moseley <moseley at hank.org> wrote:

 BTW -- I wonder about the Catalyst behavior here.

On Sat, Aug 1, 2015 at 10:36 PM, Bill Moseley <moseley at hank.org> wrote:

On Sat, Aug 1, 2015 at 6:31 AM, Stefan <maillist at s.profanter.me> wrote:

Hi,if a URL parameter contains a Unicode character (e.g. www.example.com/?param=%D6lso%DF which stands for param=Ölsoße), the parameter is not correctly parsed as Unicode.

One note here -- data over the wire must be encoded into octets.   So, all Unicode characters must be encoded and then decoded when received.  (You can't send "Unicode characters".)   UTF-8 is used now (for obvious reasons).  http://tools.ietf.org/html/rfc3986.
You are specifying %D6 -- although the Unicode characters is U+00D6, the UTF-8 octet sequence is 0xC3 0x96. See: http://www.fileformat.info/info/unicode/char/00D6/index.htm
Unless otherwise instructed, Catalyst uses UTF-8 as the encoding for decoding query parameters -- query parameters are decoded from UTF-8 octets to Perl characters.
As your example showed, if you use invalid UTF-8 sequences then Encode::decode() as used by Catalyst will replace those with the U+FFFD substitution character "�".
This may or may not be what you want.   Personally, I think it's not correct to silently modify user input.   You intended to pass "Ölsoße" but ended up with "�lso�e" -- is that really the data you would want to process/store for the request?   Seems unlikely.
If "param" is suppose to be passed as textual, UTF-8-encoded octets, and it isn't, then maybe returning a 400 is a better way of handling that.   That probably would have helped you see what is wrong in this case.
i.e. use "eval { decode( $default_query_encoding, $str, FB_CROAK | LEAVE_SRC ); }" to catch invalid data and return to the client the "$str" that failed and why.
Of course, it is also possible that you have some query parameters that you want decoded as UTF-8 and some that might represent something else (a raw sequence of bytes), and want more manual control.  In that case $c->config->{do_not_decode_query} could be used to bypass the decoding.   But then, you must manually decode() yourself.
Bill Moseley
moseley at hank.org
List: Catalyst at lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.scsys.co.uk/pipermail/catalyst/attachments/20150803/3863c54c/attachment.htm>

More information about the Catalyst mailing list