[Html-widget] UTF-8 and escaping

Bernhard Graf html-widget at augensalat.de
Tue Jun 20 17:03:02 CEST 2006


On Tuesday 20 June 2006 16:43, Adam Sjøgren wrote:

> It looks like HTML::Widget, by way of HTML::Element, escapes the
> individual bytes of utf-8 multibyte characters; example:
>
>  $ cat utf_8-1.07.pl
>  #!/usr/bin/perl
>
>  use strict;
>  use warnings;
>
>  use HTML::Widget;
>
>  my $v='Frække frølår';
>
>  my $w=HTML::Widget->new('widget')->method('get')->action('/');
>  my $e=$w->element('Textarea', 'mytext')->value($v);
>
>  print $w->process->as_xml;
>  $ ./utf_8-1.07.pl
>  <form action="/" id="widget" method="get"><fieldset><textarea
> class="textarea" cols="40" id="widget_mytext" name="mytext"
> rows="20">Fr&#195;&#166;kke
> fr&#195;&#184;l&#195;&#165;r</textarea></fieldset></form> $
>
> (LANG is set to LANG=en_DK.UTF-8, so the locale is UTF-8).
>
> Is that supposed to happen; possible to override?
>
> The escaping is done, in a way that does not take multibyte character
> sets into account, in HTML::Element::_xml_escape - which isn't called
> as a method, so it isn't easily overridable, as far as I can see.
>
> How are people coping with this? I'm afraid I'm overlooking something
>
> :*)

My solution is patching. I might be wrong, but I regard 
HTML::Element::_xml_escape() as broken (like others who filed a report 
to http://rt.cpan.org/Public/Dist/Display.html?Name=HTML-Tree).


Patch attached
-- 
Bernhard Graf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HTML-Tree.patch
Type: text/x-diff
Size: 450 bytes
Desc: not available
Url : http://lists.rawmode.org/pipermail/html-widget/attachments/20060620/0b24d780/attachment.bin 


More information about the Html-widget mailing list