[Catalyst] Url Encoded UTF8 parameters
John Napiorkowski
jjn1056 at yahoo.com
Mon Aug 3 18:41:08 GMT 2015
I'd be interesting in having some sort of flag on request, that indicated if the incoming query was bad. I can't do a die here for legacy reasons.
jnap
On Sunday, August 2, 2015 9:39 AM, Bill Moseley <moseley at hank.org> wrote:
BTW -- I wonder about the Catalyst behavior here.
On Sat, Aug 1, 2015 at 10:36 PM, Bill Moseley <moseley at hank.org> wrote:
On Sat, Aug 1, 2015 at 6:31 AM, Stefan <maillist at s.profanter.me> wrote:
Hi,if a URL parameter contains a Unicode character (e.g. www.example.com/?param=%D6lso%DF which stands for param=Ölsoße), the parameter is not correctly parsed as Unicode.
One note here -- data over the wire must be encoded into octets. So, all Unicode characters must be encoded and then decoded when received. (You can't send "Unicode characters".) UTF-8 is used now (for obvious reasons). http://tools.ietf.org/html/rfc3986.
You are specifying %D6 -- although the Unicode characters is U+00D6, the UTF-8 octet sequence is 0xC3 0x96. See: http://www.fileformat.info/info/unicode/char/00D6/index.htm
Unless otherwise instructed, Catalyst uses UTF-8 as the encoding for decoding query parameters -- query parameters are decoded from UTF-8 octets to Perl characters.
As your example showed, if you use invalid UTF-8 sequences then Encode::decode() as used by Catalyst will replace those with the U+FFFD substitution character "�".
This may or may not be what you want. Personally, I think it's not correct to silently modify user input. You intended to pass "Ölsoße" but ended up with "�lso�e" -- is that really the data you would want to process/store for the request? Seems unlikely.
If "param" is suppose to be passed as textual, UTF-8-encoded octets, and it isn't, then maybe returning a 400 is a better way of handling that. That probably would have helped you see what is wrong in this case.
i.e. use "eval { decode( $default_query_encoding, $str, FB_CROAK | LEAVE_SRC ); }" to catch invalid data and return to the client the "$str" that failed and why.
Of course, it is also possible that you have some query parameters that you want decoded as UTF-8 and some that might represent something else (a raw sequence of bytes), and want more manual control. In that case $c->config->{do_not_decode_query} could be used to bypass the decoding. But then, you must manually decode() yourself.
--
Bill Moseley
moseley at hank.org
_______________________________________________
List: Catalyst at lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.scsys.co.uk/pipermail/catalyst/attachments/20150803/3863c54c/attachment.htm>
More information about the Catalyst
mailing list