[Catalyst] URI->new() with utf8 string and Unicode::Encoding will not work (but URI->new() with utf8 octets will work)

Erik Wasser erik.wasser at iquer.net
Thu Mar 3 22:38:13 GMT 2011

Hello list,

I'm was looking for some unicode/utf8/encoding problem during my problem
and I've discovered a strange thing.

URLs provided with an unicode character will be not correctly encoded by
the Unicode::Encoding plugin.

Here's the simple test case:

1) Create the application and cd into it:
% catalyst.pl MyApp
% cd MyApp

2) Add the plugin Unicode::Encoding in lib/MyApp.pm

3) Replace the 'sub index { ...}' in 'lib/MyApp/Controller/Root.pm' with
the following code:

--------- B< ---------

sub index : Regex('^test$')
    my ( $self, $c, $parameter ) = @_;

    $c->response->body( 'length = ' . length($parameter) );

--------- B< ---------

4) Add a test script "t/04encoding.t" like this:

--------- B< ---------


use strict;
use warnings;

use Test::More tests => 4;
use Test::Deep;

use HTTP::Status;
use HTTP::Request;
use Data::Dumper;

BEGIN { use_ok 'Catalyst::Test', 'MyApp' }
BEGIN { use_ok 'MyApp::Controller::Root' }

foreach my $u ('http://localhost/test/%E3%81%8B',
"http://localhost/test/\x{304b}" )
    my $request = HTTP::Request->new(
        'GET'=> $u, [ 'Content-Type' => 'text/html; charset=utf8', ],
    print $request->as_string();
    my $response = request( $request );
    is( $response->content, 'length = 1', 'length = 1' );

--------- B< ---------

5) Start the test script

% perl t/04encoding.t

The first call will give the correct answer 'length = 1' because the 3
arabian octets were encoded correctly to one character.

The second call will give the wrong answer 'length = 3'.

Please note that the statement "print $request->as_string()" will print
the same http header:

> GET http://localhost/test/%E3%81%8B
> Content-Type: text/html; charset=utf8

My 2 cents:  Further investigation brought me to
The problem is that the second URL from above is already an utf8 string,
means that "Encode::is_utf8( $_ )" in the named method returns true and
nothing will be done by the plugin.

Before I do some silly stuff I want to hear a second opinion from the list.

Is this fixable? Is catalyst here the problem? I think not. According
to the Bug in URI (Ticket #43859, "should be _utf8_off -ed raw data
before URI encoding",
https://rt.cpan.org/Ticket/Display.html?id=43859) the problem may be
within URI.

But maybe it's possible to fix this issue in the testsuite of catalyst.

Any thoughts?

So long... Fuzz

More information about the Catalyst mailing list