[Catalyst] URI->new() with utf8 string and Unicode::Encoding will not work (but URI->new() with utf8 octets will work)

Erik Wasser erik.wasser at iquer.net
Thu Mar 3 22:38:13 GMT 2011


Hello list,

I'm was looking for some unicode/utf8/encoding problem during my problem
and I've discovered a strange thing.

URLs provided with an unicode character will be not correctly encoded by
the Unicode::Encoding plugin.

Here's the simple test case:

1) Create the application and cd into it:
% catalyst.pl MyApp
% cd MyApp

2) Add the plugin Unicode::Encoding in lib/MyApp.pm

3) Replace the 'sub index { ...}' in 'lib/MyApp/Controller/Root.pm' with
the following code:

--------- B< ---------

sub index : Regex('^test$')
{
    my ( $self, $c, $parameter ) = @_;

    $c->response->body( 'length = ' . length($parameter) );
}

--------- B< ---------

4) Add a test script "t/04encoding.t" like this:

--------- B< ---------

#!/usr/bin/perl

use strict;
use warnings;

use Test::More tests => 4;
use Test::Deep;

use HTTP::Status;
use HTTP::Request;
use Data::Dumper;

BEGIN { use_ok 'Catalyst::Test', 'MyApp' }
BEGIN { use_ok 'MyApp::Controller::Root' }

foreach my $u ('http://localhost/test/%E3%81%8B',
"http://localhost/test/\x{304b}" )
{
    my $request = HTTP::Request->new(
        'GET'=> $u, [ 'Content-Type' => 'text/html; charset=utf8', ],
    );
    print $request->as_string();
    my $response = request( $request );
    is( $response->content, 'length = 1', 'length = 1' );
}

--------- B< ---------

5) Start the test script

% perl t/04encoding.t

The first call will give the correct answer 'length = 1' because the 3
arabian octets were encoded correctly to one character.

The second call will give the wrong answer 'length = 3'.

Please note that the statement "print $request->as_string()" will print
the same http header:

> GET http://localhost/test/%E3%81%8B
> Content-Type: text/html; charset=utf8

My 2 cents:  Further investigation brought me to
Catalyst::Plugin::Unicode::Encoding::prepare_action().
The problem is that the second URL from above is already an utf8 string,
means that "Encode::is_utf8( $_ )" in the named method returns true and
nothing will be done by the plugin.

Before I do some silly stuff I want to hear a second opinion from the list.

Is this fixable? Is catalyst here the problem? I think not. According
to the Bug in URI (Ticket #43859, "should be _utf8_off -ed raw data
before URI encoding",
https://rt.cpan.org/Ticket/Display.html?id=43859) the problem may be
within URI.

But maybe it's possible to fix this issue in the testsuite of catalyst.

Any thoughts?

-- 
So long... Fuzz



More information about the Catalyst mailing list