[Catalyst-commits] r14549 - trunk/examples/CatalystAdvent/root/2014
jnapiorkowski at dev.catalyst.perl.org
jnapiorkowski at dev.catalyst.perl.org
Tue Dec 16 14:35:58 GMT 2014
Author: jnapiorkowski
Date: 2014-12-16 14:35:57 +0000 (Tue, 16 Dec 2014)
New Revision: 14549
Added:
trunk/examples/CatalystAdvent/root/2014/13.pod
Log:
13
Added: trunk/examples/CatalystAdvent/root/2014/13.pod
===================================================================
--- trunk/examples/CatalystAdvent/root/2014/13.pod (rev 0)
+++ trunk/examples/CatalystAdvent/root/2014/13.pod 2014-12-16 14:35:57 UTC (rev 14549)
@@ -0,0 +1,136 @@
+=head1 UTF8 in GET Query and Form POST
+
+All about stuff that is changing in the Holland (current development)
+release around content encoding and unicode support (part two,
+UTF8 in GET and POST parameters).
+
+=head1 Summary
+
+Starting in the upcoming L<Catalyst> release (holland, which is as of this
+writing dev003 on CPAN, and ready for your testing) Unicode encoding will be
+enabled by default. In addition we've made a ton of fixes around encoding
+and UTF8 scattered throughout the codebase.
+
+This is part two of a three part series. In this part we look at how UTF8 works
+for your URL query and form POSTed parameters.
+
+=head1 UTF8 in URL query and keywords
+
+The same rules that we find in URL paths also cover URL query parts. That is if
+one types a URL like this into the browser (again assuming a modernish UI that
+allows unicode)
+
+ http://localhost/example?♥=♥♥
+
+When this goes 'over the wire' to your application server its going to be as
+percent encoded bytes:
+
+
+ http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5
+
+When L<Catalyst> encounters this we decode the percent encoding and the utf8
+so that we can properly display this information (such as in the debugging
+logs or in a response.)
+
+ [debug] Query Parameters are:
+ .-------------------------------------+--------------------------------------.
+ | Parameter | Value |
+ +-------------------------------------+--------------------------------------+
+ | ♥ | ♥♥ |
+ '-------------------------------------+--------------------------------------'
+
+All the values and keys that are part of $c->req->query_parameters will be
+utf8 decoded. So you should not need to do anything special to take those
+values/keys and send them to the body response (since as we will see later
+L<Catalyst> will do all the necessary encoding for you).
+
+Just like with arguments and captures, you can use utf8 literals (or utf8
+strings) in $c->uri_for:
+
+ use utf8;
+ my $url = $c->uri_for( $c->controller('Root')->action_for('example'), {'♥' => '♥♥'});
+
+When you stringyfy this object (for use in a template, for example) it will automatically
+do the right thing regarding utf8 encoding and url encoding.
+
+ http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5
+
+Since again what you want is a properly url encoded version of this. Ultimately what you want
+to send over the wire via HTTP needs to be bytes (not unicode characters).
+
+Remember if you use any utf8 literals in your source code, you should use the
+C<use utf8> pragma.
+
+=head1 UTF8 in Form POST
+
+In general most modern browsers will follow the specification, which says that POSTed
+form fields should be encoded in the same way that the document was served with. That means
+that if you are using modern Catalyst and serving UTF8 encoded responses, a browser is
+supposed to notice that and encode the form POSTs accordingly.
+
+As a result since L<Catalyst> now serves UTF8 encoded responses by default, this means that
+you can mostly rely on incoming form POSTs to be so encoded. L<Catalyst> will make this
+assumption and decode accordingly (unless you explicitly turn off encoding...) If you are
+running Catalyst in developer debug, then you will see the correct unicode characters in
+the debug output. For example if you generate a POST request:
+
+ use Catalyst::Test 'MyApp';
+ use utf8;
+
+ my $res = request POST "/example/posted", ['♥'=>'♥', '♥♥'=>'♥'];
+
+Running in CATALYST_DEBUG=1 mode you should see output like this:
+
+[debug] Body Parameters are:
+.-------------------------------------+--------------------------------------.
+| Parameter | Value |
++-------------------------------------+--------------------------------------+
+| ♥ | ♥ |
+| ♥♥ | ♥ |
+'-------------------------------------+--------------------------------------'
+
+And if you had a controller like this:
+
+ package MyApp::Controller::Example;
+
+ use base 'Catalyst::Controller';
+
+ sub posted :POST Local {
+ my ($self, $c) = @_;
+ $c->res->content_type('text/plain');
+ $c->res->body("hearts => ${\$c->req->post_parameters->{♥}}");
+ }
+
+The following test case would be true:
+
+ use Encode 2.21 'decode_utf8';
+ is decode_utf8($req->content), 'hearts => ♥';
+
+In this case we decode so that we can print and compare strings with multibyte characters.
+
+
+B<NOTE> In some cases some browsers may not follow the specification and set the form POST
+encoding based on the server response. Catalyst itself doesn't attempt any workarounds, but one
+common approach is to use a hidden form field with a UTF8 value (You might be familiar with
+this from how Ruby on Rails has HTML form helpers that do that automatically). In that case
+some browsers will send UTF8 encoded if it notices the hidden input field contains such a
+character. Also, you can add an HTML attribute to your form tag which many modern browsers
+will respect to set the encoding (accept-charset="utf-8"). And lastly there are some javascript
+based tricks and workarounds for even more odd cases (just search the web for this will return
+a number of approaches. Hopefully as more compliant browsers become popular these edge cases
+will fade.
+
+=head1 Conclusion
+
+Getting utf8 characters from form POSTs and in your URL query should mostly 'do the right
+thing'. Of course there's a bit of an art to this and we expect that over time we'll
+need to build up a cookbook of practices and workarounds to help even more.
+
+In the final article we we look at how L<Catalyst> does response body encoding, including
+streaming, delayed and filehandle responses.
+
+=head1 Author
+
+John Napiorkowski L<jjnapiork at cpan.org|email:jjnapiork at cpan.org>
+
+=cut
More information about the Catalyst-commits
mailing list