[Catalyst] URI->new() with utf8 string and Unicode::Encoding will
not work (but URI->new() with utf8 octets will work)
Erik Wasser
erik.wasser at iquer.net
Thu Mar 3 22:38:13 GMT 2011
Hello list,
I'm was looking for some unicode/utf8/encoding problem during my problem
and I've discovered a strange thing.
URLs provided with an unicode character will be not correctly encoded by
the Unicode::Encoding plugin.
Here's the simple test case:
1) Create the application and cd into it:
% catalyst.pl MyApp
% cd MyApp
2) Add the plugin Unicode::Encoding in lib/MyApp.pm
3) Replace the 'sub index { ...}' in 'lib/MyApp/Controller/Root.pm' with
the following code:
--------- B< ---------
sub index : Regex('^test$')
{
my ( $self, $c, $parameter ) = @_;
$c->response->body( 'length = ' . length($parameter) );
}
--------- B< ---------
4) Add a test script "t/04encoding.t" like this:
--------- B< ---------
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 4;
use Test::Deep;
use HTTP::Status;
use HTTP::Request;
use Data::Dumper;
BEGIN { use_ok 'Catalyst::Test', 'MyApp' }
BEGIN { use_ok 'MyApp::Controller::Root' }
foreach my $u ('http://localhost/test/%E3%81%8B',
"http://localhost/test/\x{304b}" )
{
my $request = HTTP::Request->new(
'GET'=> $u, [ 'Content-Type' => 'text/html; charset=utf8', ],
);
print $request->as_string();
my $response = request( $request );
is( $response->content, 'length = 1', 'length = 1' );
}
--------- B< ---------
5) Start the test script
% perl t/04encoding.t
The first call will give the correct answer 'length = 1' because the 3
arabian octets were encoded correctly to one character.
The second call will give the wrong answer 'length = 3'.
Please note that the statement "print $request->as_string()" will print
the same http header:
> GET http://localhost/test/%E3%81%8B
> Content-Type: text/html; charset=utf8
My 2 cents: Further investigation brought me to
Catalyst::Plugin::Unicode::Encoding::prepare_action().
The problem is that the second URL from above is already an utf8 string,
means that "Encode::is_utf8( $_ )" in the named method returns true and
nothing will be done by the plugin.
Before I do some silly stuff I want to hear a second opinion from the list.
Is this fixable? Is catalyst here the problem? I think not. According
to the Bug in URI (Ticket #43859, "should be _utf8_off -ed raw data
before URI encoding",
https://rt.cpan.org/Ticket/Display.html?id=43859) the problem may be
within URI.
But maybe it's possible to fix this issue in the testsuite of catalyst.
Any thoughts?
--
So long... Fuzz
More information about the Catalyst
mailing list