<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">here’s also a perl-script that does it<div><br></div><div>------------------------------------------</div><div><div>use Encode qw(decode encode);</div><div>use LWP::UserAgent;</div><div><br></div><div>my $str = '深入 so what';</div><div>my $oct = encode("gb2312", $str);</div><div>my $url = '<a href="http://wbc-inco.net/object/event/past'">http://wbc-inco.net/object/event/past'</a>;</div><div>my $ua = LWP::UserAgent->new();</div><div>my $response = $ua->post( $url, { $oct => $oct } );</div><div>my $content = $response->decoded_content();</div><div>------------------------------------------</div><div><br></div><div><div>On 22 Jul 2014, at 11:33, Bernhard Bauch <<a href="mailto:bauch@zsi.at">bauch@zsi.at</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html charset=utf-8"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">hey all,<div><br></div><div>this pyton3 script triggers the error ….</div><div><br></div><div>--------------------------------</div><div>import httplib2<br>import urllib.parse<br><br>somestr = '深入 so what'<br>encodedstr = somestr.encode('gb2312')<br>url = '<a href="http://myappdomain.com/search'">http://myappdomain.com/search'</a> <br>body = { encodedstr:encodedstr }<br>headers = {<br> 'Content-type': 'application/x-www-form-urlencoded', <br> 'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',<br> 'Accept-Encoding': 'gzip, deflate',<br> 'Accept-Language': 'zh;q=0.9,en;q=0.8'<br>}<br>http = httplib2.Http()<br>response, content = http.request(url, 'POST', headers=headers, body=urllib.parse.urlencode(body))</div><div>————————————————</div><div><br></div><div>now its possible to reproduce the error :)</div><div><br></div><div>any ideas how to solve this ?</div><div>ruby people did this with adding a utf8-sanitizer in the middleware..</div><div><br></div><div>bye, bernhard</div><div><br></div><div><br><div><div>On 21 Jul 2014, at 22:19, Bernhard Bauch <<a href="mailto:bauch@zsi.at">bauch@zsi.at</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html charset=windows-1252"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">more news..<div><br></div><div>the crawler/searcheinge that triggers these errors is</div><div><span class="Apple-tab-span" style="white-space: pre;">        </span><a href="http://easou.com/">http://easou.com</a></div><div><br></div><div>this searchengine delivers their pages not in UTF8 — but in “gb2312” which is “simple chinese”</div><div>if i open the “wrong utf8” parameters from the faulty requests with “gb2312” some readable signs appear.</div><div>>> this leads me to: catalyst does not handle requests with gb2312 encoded parameters (because they are not utf8) -and the request does not promote that it is encoded in other than utf8.</div><div><br></div><div>any ideas what to do ?</div><div><br></div><div>bye, bernhard</div><div><br></div><div><br></div><div><br><div><div>On 21 Jul 2014, at 14:36, Roman Winfinit <<a href="mailto:winfinit@gmail.com">winfinit@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><p dir="ltr">Hello,</p><p dir="ltr">How are you running your application? Ie: mod_perl, fcgi, fcgi + httpd/nginx, plack + ... also what version of perl are you using and what os?</p><p dir="ltr">-roman</p>
<div class="gmail_quote">On Jul 21, 2014 6:58 AM, "Bernhard Bauch" <<a href="mailto:bauch@zsi.at">bauch@zsi.at</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">Hey all,<div><br></div><div>on most of my website running on (latest catalyst: 5.90065) i always get utf8 related errors.</div><div>the usually appear if a spider </div><div><span style="white-space:pre-wrap">        </span>Mozilla/5.0 (compatible; EasouSpider; +<a href="http://www.easou.com/search/spider.html" target="_blank">http://www.easou.com/search/spider.html</a>)</div>
<div>comes accross.</div><div><br></div><div>the error is:</div><div><span style="white-space:pre-wrap">        </span>Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 167.</div>
<div><br></div><div>It took me while to get the actual parameters the spiders sends because the debug-message of catalyst do not tell that much :...</div><div><br></div><div>—————————————</div><div>[2014/07/16 15:08:47] [5.255.253.218] [INFO] vim /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) [10682] [Wed Jul 16 15:08:47 2014] ***<br>
[2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; Content-Type: text/plain; charset=UTF-8; Content-Length: unknown<br>[2014/07/16 15:08:47] [5.255.253.218] [INFO] vim /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s (154.059/s)<br>
.---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------.<br>| Action | Time |<br>
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+<br>'---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------'</div>
<div>—————————————</div><div><br></div><div>i changed to Plugin::Unicode::Encoding plugin a bit to find out what the client sends … the results are these:</div><div>UTF8 trash arrives - and the module seems unable to deal with it…</div>
<div><br></div><div>————————————</div><div>Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to Unicode at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 170.<br> -<br><br>
URL: notice/list<br><br>PARAMS:$VAR1 = {<br> 'X*Ö^K^@^@^@^@¸®ä<br>^@^@^@^@8<83>^H^K^@^@^@^@h¡ä<br>^@^@^@^@Hµä<br>^@^@^@^@X^Z^N^Q^@^@^@^@ø<91>^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸<92>^F^Q^@^@^@^@(^K^N^Q^@^@^@^@<88>^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@<88>úÝ^P^@^@^@^@^Xá( ^@^@^@^@ئÆ<br>
^@^@^@^@Øï*^Q^@^@^@^@^X' => '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@ J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@ <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@'<br>
};<br><br><br> // value: $VAR1 = '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@ J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@ <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@';<br>
<br><br>headers: Connection: close<br>Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1<br>Accept-Encoding: gzip, deflate<br>Accept-Language: zh;q=0.9,en;q=0.8<br>
Host: <a href="http://wbc-inco.net/" target="_blank">wbc-inco.net</a><br>User-Agent: Mozilla/5.0 (compatible; EasouSpider; +<a href="http://www.easou.com/search/spider.html" target="_blank">http://www.easou.com/search/spider.html</a>)<br>
Content-Length: 927<br>Content-Type: application/x-www-form-urlencoded<br>REFER: <a href="http://b------.net/%E2%80%9C" target="_blank">http://b------.net/“</a></div><div><br></div><div>————————————<br><br></div><div>to understand the logging above: this is what i added /changed in the Catalyst::Plugin::Unicode::Encoding</div>
<div><br></div><div>————————————————————</div><div>around line 168:</div><div><br></div><div> my $val;<br> eval {<br> $val = Encode::is_utf8( $value ) ? $value : $enc->decode( $value, $CHECK );<br>
};<br> if ($@){<br> # UPS !<br> # get request infos<br>use Data::Dumper;<br>my $params = $self->req->parameters;<br>my $headers= $self->req->headers->as_string;<br>die "UTF8 Error: $@ - \n\nURL: " . $self->req->path . "\n\nPARAMS:" . Dumper( $params ) . "\n\n // value: " . Dumper($value) . "\n\nheaders: " . $headers;<br>
….</div><div>————————————————————</div><div><br></div><div>I guess my Catalyst Apps are not the only ones with these errors ?</div><div><br></div><div><br></div><div>about my App settings / config:</div><div><br></div><div>
app-config has</div><div><span style="white-space:pre-wrap">        </span>encoding UTF-8</div><div><br></div><div>App.pm does not load Unicode::Encoding anymore (since this is not need when using latest Catalyst: 5.90065)</div>
<div><br></div><div>i am using postgres with</div><div><span style="white-space:pre-wrap">        </span>pg_enable_utf8 1</div><div>(but the error about is far away from any DB related problem i guess)</div><div><br></div><div>using Catalyst::Plugin::Unicode::Encoding version 2.1 (coming with catalyxt)</div>
<div><br></div><div>i just checked out the tracker for catalyst on cpan, there is an UTF8 issue ticket</div><div><span style="white-space:pre-wrap">        </span><a href="https://rt.cpan.org/Public/Bug/Display.html?id=94957" target="_blank">https://rt.cpan.org/Public/Bug/Display.html?id=94957</a></div>
<div>but i does not look as it was this problem ...</div><div><br></div><div>Any ideas what todo ?</div><div>Add a issue/ticket ?</div><div><br></div><div>thanks for feedback,</div><div>bernhard bauch<span style="white-space:pre-wrap">        </span></div>
<div><br></div><div><br></div><div><br><div>
<div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;"><div>—</div><div>Bernhard Bauch, Webdevelopment<br>ZSI - Zentrum für soziale Innovation<br>
<a href="mailto:bauch@zsi.at" target="_blank">bauch@zsi.at</a><br>Skype: berni-zsi</div></div>
</div>
<br></div></div><br>_______________________________________________<br>
List: <a href="mailto:Catalyst@lists.scsys.co.uk">Catalyst@lists.scsys.co.uk</a><br>
Listinfo: <a href="http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst" target="_blank">http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst</a><br>
Searchable archive: <a href="http://www.mail-archive.com/catalyst@lists.scsys.co.uk/" target="_blank">http://www.mail-archive.com/catalyst@lists.scsys.co.uk/</a><br>
Dev site: <a href="http://dev.catalyst.perl.org/" target="_blank">http://dev.catalyst.perl.org/</a><br>
<br></blockquote></div>
!DSPAM:53cd09a3104511692032419!
_______________________________________________<br>List: <a href="mailto:Catalyst@lists.scsys.co.uk">Catalyst@lists.scsys.co.uk</a><br>Listinfo: <a href="http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst">http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst</a><br>Searchable archive: <a href="http://www.mail-archive.com/catalyst@lists.scsys.co.uk/">http://www.mail-archive.com/catalyst@lists.scsys.co.uk/</a><br>Dev site: <a href="http://dev.catalyst.perl.org/">http://dev.catalyst.perl.org/</a><br><br><br>!DSPAM:53cd09a3104511692032419!<br></blockquote></div><br><div>
<div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>—</div><div>Bernhard Bauch, Webdevelopment<br>ZSI - Zentrum für soziale Innovation<br><a href="mailto:bauch@zsi.at">bauch@zsi.at</a><br>Skype: berni-zsi</div></div>
</div>
<br></div></div>_______________________________________________<br>List: <a href="mailto:Catalyst@lists.scsys.co.uk">Catalyst@lists.scsys.co.uk</a><br>Listinfo: <a href="http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst">http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst</a><br>Searchable archive: <a href="http://www.mail-archive.com/catalyst@lists.scsys.co.uk/">http://www.mail-archive.com/catalyst@lists.scsys.co.uk/</a><br>Dev site: <a href="http://dev.catalyst.perl.org/">http://dev.catalyst.perl.org/</a><br><br><br>!DSPAM:53cd7626104517769513966!<br></blockquote></div><br><div>
<div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>—</div><div>Bernhard Bauch, Webdevelopment<br>ZSI - Zentrum für soziale Innovation<br><a href="mailto:bauch@zsi.at">bauch@zsi.at</a><br>Skype: berni-zsi</div></div>
</div>
<br></div></div>_______________________________________________<br>List: <a href="mailto:Catalyst@lists.scsys.co.uk">Catalyst@lists.scsys.co.uk</a><br>Listinfo: <a href="http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst">http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst</a><br>Searchable archive: <a href="http://www.mail-archive.com/catalyst@lists.scsys.co.uk/">http://www.mail-archive.com/catalyst@lists.scsys.co.uk/</a><br>Dev site: <a href="http://dev.catalyst.perl.org/">http://dev.catalyst.perl.org/</a><br><br><br>!DSPAM:53ce305e104511469956211!<br></blockquote></div><br><div>
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>—</div><div>Bernhard Bauch, Webdevelopment<br>ZSI - Zentrum für soziale Innovation<br><a href="mailto:bauch@zsi.at">bauch@zsi.at</a><br>Skype: berni-zsi</div></div>
</div>
<br></div></body></html>