[Catalyst] What do you guys use for sanitizing HTML input

J. Shirley jshirley at gmail.com
Sun Jul 19 15:45:31 GMT 2009


On Sun, Jul 19, 2009 at 1:29 AM, Devin Austin <devin.austin at gmail.com>wrote:

> HTML::StripScripts is nice
>
>
> On Sun, Jul 19, 2009 at 2:15 AM, Zbigniew Lukasiak <zzbbyy at gmail.com>wrot=
e:
>
>> Hi,
>>
>> There seems to be a log list of HTML sanitizers at CPAN and no guide.
>> So I quickly made a list at the P5P wiki:
>> http://www.perlfoundation.org/perl5/index.cgi?html_sanitazing and I am
>> asking here what are your experiences with that subject.
>>
>> Myself, I sometime ago I've wrote a sanitizer for HTML::FormHandler
>> based on HTML::Scrubber - but it seems that there are problems with
>> installing it so it never got into the HTML::FormHandler repository.
>> I noticed that there is a new HTML sanitizer bundled with Mojo:
>> http://search.cpan.org/~mramberg/MojoMojo-0.999030/lib/HTML/Declaw.pm<ht=
tp://search.cpan.org/%7Emramberg/MojoMojo-0.999030/lib/HTML/Declaw.pm>
>> by our own Marcus Ramberg.  The POD says it is a modifed version of
>> HTML::Defang - but there is no clue as to what was really modified and
>> why it is a fork.
>>
>

I've had good luck with HTML::Scrubber, after running through enough unit
tests and usability tests.

Here's my config that I've found to be "safe" for UCC (yaml format):

    HTML::Scrubber:
        allow:
            - a
            - b
            - strong
            - em
            - i
            - img
            - br
            - p
            - span
        rules:
            a:
                href: 1
                name: 1
                target: !!perl/regexp (?i-xsm:^target$)
            b:
                *: 0
            img:
                src: 1
                alt: 1
                title: 1
                *: 0
            em:
                *: 0
            br:
                *: 0
            p:
                *: 0
            strong:
                *: 0
            span:
                *: 0
                style: !!perl/regexp
'(?si-xism:^(?:color:\s*#[a-fA-F0-9]{6};?|text-decoration:\s*underline;?)$)'

__END__

-J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20090719/1d397=
a6e/attachment.htm


More information about the Catalyst mailing list