[Html-widget] UTF-8 and escaping

Adam Sjøgren adsj at novozymes.com
Tue Jun 20 16:43:15 CEST 2006


  Hi.


It looks like HTML::Widget, by way of HTML::Element, escapes the
individual bytes of utf-8 multibyte characters; example:

 $ cat utf_8-1.07.pl 
 #!/usr/bin/perl

 use strict;
 use warnings;

 use HTML::Widget;

 my $v='Frække frølår';

 my $w=HTML::Widget->new('widget')->method('get')->action('/');
 my $e=$w->element('Textarea', 'mytext')->value($v);

 print $w->process->as_xml;
 $ ./utf_8-1.07.pl 
 <form action="/" id="widget" method="get"><fieldset><textarea class="textarea" cols="40" id="widget_mytext" name="mytext" rows="20">Fr&#195;&#166;kke fr&#195;&#184;l&#195;&#165;r</textarea></fieldset></form>
 $ 

(LANG is set to LANG=en_DK.UTF-8, so the locale is UTF-8).

Is that supposed to happen; possible to override?

The escaping is done, in a way that does not take multibyte character
sets into account, in HTML::Element::_xml_escape - which isn't called
as a method, so it isn't easily overridable, as far as I can see.

How are people coping with this? I'm afraid I'm overlooking something
:*)


(For now I'm using a <%filter>-section in my Mason-autohandler that
translates &#NNN; back, but that is patching up the symptom rather
than a cure...)


  Best regards,

   Adam

-- 
                                                          Adam Sjøgren
                                                    adsj at novozymes.com



More information about the Html-widget mailing list