[Catalyst] UTF8 problems with plugin::encoding

Mark Ellis m at rkellis.com
Tue Jul 22 11:31:38 GMT 2014


I don't think there's anything you can do, you're app wants utf8 and
they're sending something else which doesn't map. and since you can't know
what format it is in, then all you can do is die if it doesn't map, which
is what the plugin does.

as far as i can tell the ruby middleware i found handles this by returning
a 400 bad request, which cataylst does as well. so there's no affect, other
than the noise in the logs.


On 22 July 2014 11:21, Bernhard Bauch <bauch at zsi.at> wrote:

> here’s also a perl-script that does it
>
> ------------------------------------------
> use Encode qw(decode encode);
> use LWP::UserAgent;
>
> my $str = '深入 so what';
> my $oct = encode("gb2312", $str);
> my $url = 'http://wbc-inco.net/object/event/past';
> my $ua       = LWP::UserAgent->new();
> my $response = $ua->post( $url, { $oct => $oct } );
> my $content  = $response->decoded_content();
> ------------------------------------------
>
> On 22 Jul 2014, at 11:33, Bernhard Bauch <bauch at zsi.at> wrote:
>
> hey all,
>
> this pyton3 script triggers the error ….
>
> --------------------------------
> import httplib2
> import urllib.parse
>
> somestr = '深入 so what'
> encodedstr = somestr.encode('gb2312')
> url = 'http://myappdomain.com/search'
> body = { encodedstr:encodedstr }
> headers = {
>     'Content-type': 'application/x-www-form-urlencoded',
>     'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml,
> image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',
>     'Accept-Encoding': 'gzip, deflate',
>     'Accept-Language': 'zh;q=0.9,en;q=0.8'
> }
> http = httplib2.Http()
> response, content = http.request(url, 'POST', headers=headers,
> body=urllib.parse.urlencode(body))
> ————————————————
>
> now its possible to reproduce the error :)
>
> any ideas how to solve this ?
> ruby people did this with adding a utf8-sanitizer in the middleware..
>
> bye, bernhard
>
>
> On 21 Jul 2014, at 22:19, Bernhard Bauch <bauch at zsi.at> wrote:
>
> more news..
>
> the crawler/searcheinge that triggers these errors is
> http://easou.com
>
> this searchengine delivers their pages not in UTF8 — but in “gb2312” which
> is “simple chinese”
> if i open the “wrong utf8” parameters from the faulty requests with
> “gb2312” some readable signs appear.
> >> this leads me to: catalyst does not handle requests with gb2312 encoded
> parameters (because they are not utf8) -and the request does not promote
> that it is encoded in other than utf8.
>
> any ideas what to do ?
>
> bye, bernhard
>
>
>
> On 21 Jul 2014, at 14:36, Roman Winfinit <winfinit at gmail.com> wrote:
>
> Hello,
>
> How are you running your application? Ie: mod_perl, fcgi, fcgi +
> httpd/nginx, plack + ... also what version of perl are you using and what
> os?
>
> -roman
> On Jul 21, 2014 6:58 AM, "Bernhard Bauch" <bauch at zsi.at> wrote:
>
>> Hey all,
>>
>> on most of my website running on (latest catalyst: 5.90065) i always get
>> utf8 related errors.
>> the usually appear if a spider
>> Mozilla/5.0 (compatible; EasouSpider; +
>> http://www.easou.com/search/spider.html)
>> comes accross.
>>
>> the error is:
>> Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to
>> Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line
>> 167.
>>
>> It took me while to get the actual parameters the spiders sends because
>> the debug-message of catalyst do not tell that much :...
>>
>> —————————————
>> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
>> /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s)
>> [10682] [Wed Jul 16 15:08:47 2014] ***
>> [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim
>> /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400;
>> Content-Type: text/plain; charset=UTF-8; Content-Length: unknown
>> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
>> /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s
>> (154.059/s)
>>
>> .---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------.
>> | Action
>>
>>                                                | Time      |
>>
>> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
>>
>> '---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------'
>> —————————————
>>
>> i changed to Plugin::Unicode::Encoding plugin a bit to find out what the
>> client sends … the results are these:
>> UTF8 trash arrives - and the module seems unable to deal with it…
>>
>> ————————————
>> Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to
>> Unicode at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm
>> line 170.
>>  -
>>
>> URL: notice/list
>>
>> PARAMS:$VAR1 = {
>>           'X*Ö^K^@^@^@^@¸®ä
>> ^@^@^@^@8<83>^H^K^@^@^@^@h¡ä
>> ^@^@^@^@Hµä
>> ^@^@^@^@X^Z^N^Q^@^@^@^@ø<91>^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸<92>^F^Q^@^@^@^@(^K^N^Q^@^@^@^@<88>^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@<88>úÝ^P^@^@^@^@^Xá(
>> ^@^@^@^@ئÆ
>> ^@^@^@^@Øï*^Q^@^@^@^@^X'
>> => '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@
>> J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@
>> <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@
>> <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@'
>>         };
>>
>>
>>  // value: $VAR1
>> = '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@
>> J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@
>> <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@
>> <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@';
>>
>>
>> headers: Connection: close
>> Accept: text/html, application/xml;q=0.9, application/xhtml+xml,
>> image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
>> Accept-Encoding: gzip, deflate
>> Accept-Language: zh;q=0.9,en;q=0.8
>> Host: wbc-inco.net
>> User-Agent: Mozilla/5.0 (compatible; EasouSpider; +
>> http://www.easou.com/search/spider.html)
>> Content-Length: 927
>> Content-Type: application/x-www-form-urlencoded
>> REFER: http://b------.net/“
>>
>> ————————————
>>
>> to understand the logging above: this is what i added /changed in
>> the Catalyst::Plugin::Unicode::Encoding
>>
>> ————————————————————
>> around line 168:
>>
>>         my $val;
>>         eval {
>>          $val =  Encode::is_utf8( $value ) ? $value : $enc->decode(
>> $value, $CHECK );
>>         };
>>         if ($@){
>>             # UPS !
>>         # get request infos
>> use Data::Dumper;
>> my $params = $self->req->parameters;
>> my $headers= $self->req->headers->as_string;
>> die "UTF8 Error: $@ - \n\nURL: " . $self->req->path . "\n\nPARAMS:" .
>> Dumper( $params ) . "\n\n // value: " . Dumper($value) . "\n\nheaders: " .
>> $headers;
>> ….
>> ————————————————————
>>
>> I guess my Catalyst Apps are not the only ones with these errors ?
>>
>>
>> about my App settings / config:
>>
>> app-config has
>> encoding                UTF-8
>>
>> App.pm does not load Unicode::Encoding anymore (since this is not need
>> when using latest Catalyst: 5.90065)
>>
>> i am using postgres with
>> pg_enable_utf8 1
>> (but the error about is far away from any DB related problem i guess)
>>
>> using Catalyst::Plugin::Unicode::Encoding version 2.1 (coming with
>> catalyxt)
>>
>> i just checked out the tracker for catalyst on cpan, there is an UTF8
>> issue ticket
>> https://rt.cpan.org/Public/Bug/Display.html?id=94957
>> but i does not look as it was this problem ...
>>
>> Any ideas what todo ?
>> Add a issue/ticket ?
>>
>> thanks for feedback,
>> bernhard bauch
>>
>>
>>
>>>> Bernhard Bauch, Webdevelopment
>> ZSI - Zentrum für soziale Innovation
>> bauch at zsi.at
>> Skype: berni-zsi
>>
>>
>> _______________________________________________
>> List: Catalyst at lists.scsys.co.uk
>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>> Searchable archive:
>> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
>> Dev site: http://dev.catalyst.perl.org/
>>
>> !DSPAM:53cd09a3104511692032419!
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
>
> !DSPAM:53cd09a3104511692032419!
>
>
>> Bernhard Bauch, Webdevelopment
> ZSI - Zentrum für soziale Innovation
> bauch at zsi.at
> Skype: berni-zsi
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
>
> !DSPAM:53cd7626104517769513966!
>
>
>> Bernhard Bauch, Webdevelopment
> ZSI - Zentrum für soziale Innovation
> bauch at zsi.at
> Skype: berni-zsi
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
>
> !DSPAM:53ce305e104511469956211!
>
>
>> Bernhard Bauch, Webdevelopment
> ZSI - Zentrum für soziale Innovation
> bauch at zsi.at
> Skype: berni-zsi
>
>
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.scsys.co.uk/pipermail/catalyst/attachments/20140722/96b58959/attachment.htm>


More information about the Catalyst mailing list