[Catalyst] UTF8 problems with plugin::encoding

Bernhard Bauch bauch at zsi.at
Tue Jul 22 10:21:07 GMT 2014


here’s also a perl-script that does it

------------------------------------------
use Encode qw(decode encode);
use LWP::UserAgent;

my $str = '深入 so what';
my $oct = encode("gb2312", $str);
my $url = 'http://wbc-inco.net/object/event/past';
my $ua       = LWP::UserAgent->new();
my $response = $ua->post( $url, { $oct => $oct } );
my $content  = $response->decoded_content();
------------------------------------------

On 22 Jul 2014, at 11:33, Bernhard Bauch <bauch at zsi.at> wrote:

> hey all,
> 
> this pyton3 script triggers the error ….
> 
> --------------------------------
> import httplib2
> import urllib.parse
> 
> somestr = '深入 so what'
> encodedstr = somestr.encode('gb2312')
> url = 'http://myappdomain.com/search'   
> body = { encodedstr:encodedstr }
> headers = {
>     'Content-type': 'application/x-www-form-urlencoded', 
>     'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',
>     'Accept-Encoding': 'gzip, deflate',
>     'Accept-Language': 'zh;q=0.9,en;q=0.8'
> }
> http = httplib2.Http()
> response, content = http.request(url, 'POST', headers=headers, body=urllib.parse.urlencode(body))
> ————————————————
> 
> now its possible to reproduce the error :)
> 
> any ideas how to solve this ?
> ruby people did this with adding a utf8-sanitizer in the middleware..
> 
> bye, bernhard
> 
> 
> On 21 Jul 2014, at 22:19, Bernhard Bauch <bauch at zsi.at> wrote:
> 
>> more news..
>> 
>> the crawler/searcheinge that triggers these errors is
>> 	http://easou.com
>> 
>> this searchengine delivers their pages not in UTF8 — but in “gb2312” which is “simple chinese”
>> if i open the “wrong utf8” parameters from the faulty requests with “gb2312” some readable signs appear.
>> >> this leads me to: catalyst does not handle requests with gb2312 encoded parameters (because they are not utf8) -and the request does not promote that it is encoded in other than utf8.
>> 
>> any ideas what to do ?
>> 
>> bye, bernhard
>> 
>> 
>> 
>> On 21 Jul 2014, at 14:36, Roman Winfinit <winfinit at gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> How are you running your application? Ie: mod_perl, fcgi, fcgi + httpd/nginx, plack + ... also what version of perl are you using and what os?
>>> 
>>> -roman
>>> 
>>> On Jul 21, 2014 6:58 AM, "Bernhard Bauch" <bauch at zsi.at> wrote:
>>> Hey all,
>>> 
>>> on most of my website running on (latest catalyst: 5.90065) i always get utf8 related errors.
>>> the usually appear if a spider 
>>> 	Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
>>> comes accross.
>>> 
>>> the error is:
>>> 	Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 167.
>>> 
>>> It took me while to get the actual parameters the spiders sends because the debug-message of catalyst do not tell that much :...
>>> 
>>> —————————————
>>> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) [10682] [Wed Jul 16 15:08:47 2014] ***
>>> [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; Content-Type: text/plain; charset=UTF-8; Content-Length: unknown
>>> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s (154.059/s)
>>> .---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------.
>>> | Action                                                                                                                                                                                            | Time      |
>>> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
>>> '---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------'
>>> —————————————
>>> 
>>> i changed to Plugin::Unicode::Encoding plugin a bit to find out what the client sends … the results are these:
>>> UTF8 trash arrives - and the module seems unable to deal with it…
>>> 
>>> ————————————
>>> Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to Unicode at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 170.
>>>  -
>>> 
>>> URL: notice/list
>>> 
>>> PARAMS:$VAR1 = {
>>>           'X*Ö^K^@^@^@^@¸®ä
>>> ^@^@^@^@8<83>^H^K^@^@^@^@h¡ä
>>> ^@^@^@^@Hµä
>>> ^@^@^@^@X^Z^N^Q^@^@^@^@ø<91>^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸<92>^F^Q^@^@^@^@(^K^N^Q^@^@^@^@<88>^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@<88>úÝ^P^@^@^@^@^Xá( ^@^@^@^@ئÆ
>>> ^@^@^@^@Øï*^Q^@^@^@^@^X' => '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@ J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@ <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@'
>>>         };
>>> 
>>> 
>>>  // value: $VAR1 = '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@ J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@ <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@';
>>> 
>>> 
>>> headers: Connection: close
>>> Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
>>> Accept-Encoding: gzip, deflate
>>> Accept-Language: zh;q=0.9,en;q=0.8
>>> Host: wbc-inco.net
>>> User-Agent: Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
>>> Content-Length: 927
>>> Content-Type: application/x-www-form-urlencoded
>>> REFER: http://b------.net/“
>>> 
>>> ————————————
>>> 
>>> to understand the logging above: this is what i added /changed in the Catalyst::Plugin::Unicode::Encoding
>>> 
>>> ————————————————————
>>> around line 168:
>>> 
>>>         my $val;
>>>         eval {
>>>          $val =  Encode::is_utf8( $value ) ? $value : $enc->decode( $value, $CHECK );
>>>         };
>>>         if ($@){
>>>             # UPS !
>>>         # get request infos
>>> use Data::Dumper;
>>> my $params = $self->req->parameters;
>>> my $headers= $self->req->headers->as_string;
>>> die "UTF8 Error: $@ - \n\nURL: " . $self->req->path . "\n\nPARAMS:" . Dumper( $params ) . "\n\n // value: " . Dumper($value) . "\n\nheaders: " . $headers;
>>> ….
>>> ————————————————————
>>> 
>>> I guess my Catalyst Apps are not the only ones with these errors ?
>>> 
>>> 
>>> about my App settings / config:
>>> 
>>> app-config has
>>> 	encoding                UTF-8
>>> 
>>> App.pm does not load Unicode::Encoding anymore (since this is not need when using latest Catalyst: 5.90065)
>>> 
>>> i am using postgres with
>>> 	pg_enable_utf8 1
>>> (but the error about is far away from any DB related problem i guess)
>>> 
>>> using Catalyst::Plugin::Unicode::Encoding version 2.1 (coming with catalyxt)
>>> 
>>> i just checked out the tracker for catalyst on cpan, there is an UTF8 issue ticket
>>> 	https://rt.cpan.org/Public/Bug/Display.html?id=94957
>>> but i does not look as it was this problem ...
>>> 
>>> Any ideas what todo ?
>>> Add a issue/ticket ?
>>> 
>>> thanks for feedback,
>>> bernhard bauch	
>>> 
>>> 
>>> 
>>>>>> Bernhard Bauch, Webdevelopment
>>> ZSI - Zentrum für soziale Innovation
>>> bauch at zsi.at
>>> Skype: berni-zsi
>>> 
>>> 
>>> _______________________________________________
>>> List: Catalyst at lists.scsys.co.uk
>>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>>> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
>>> Dev site: http://dev.catalyst.perl.org/
>>> 
>>> !DSPAM:53cd09a3104511692032419! _______________________________________________
>>> List: Catalyst at lists.scsys.co.uk
>>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>>> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
>>> Dev site: http://dev.catalyst.perl.org/
>>> 
>>> 
>>> !DSPAM:53cd09a3104511692032419!
>> 
>>>> Bernhard Bauch, Webdevelopment
>> ZSI - Zentrum für soziale Innovation
>> bauch at zsi.at
>> Skype: berni-zsi
>> 
>> _______________________________________________
>> List: Catalyst at lists.scsys.co.uk
>> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
>> Dev site: http://dev.catalyst.perl.org/
>> 
>> 
>> !DSPAM:53cd7626104517769513966!
> 
>> Bernhard Bauch, Webdevelopment
> ZSI - Zentrum für soziale Innovation
> bauch at zsi.at
> Skype: berni-zsi
> 
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
> 
> 
> !DSPAM:53ce305e104511469956211!

—
Bernhard Bauch, Webdevelopment
ZSI - Zentrum für soziale Innovation
bauch at zsi.at
Skype: berni-zsi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.scsys.co.uk/pipermail/catalyst/attachments/20140722/a656522c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.scsys.co.uk/pipermail/catalyst/attachments/20140722/a656522c/attachment.pgp>


More information about the Catalyst mailing list