[Catalyst] HTML to plain text conversion

Wade.Stuart at fallon.com Wade.Stuart at fallon.com
Tue Jan 9 20:12:19 GMT 2007







Brian <catalyst at gzmarketing.com> wrote on 01/09/2007 12:17:14 PM:

>
> > Unfortunately it doesn't print href attributes of links.
> > I also tried HTML::Scrubber as proposed by Carl Franks, but
> basically it keeps
> > some tags we chose to allow.
> >
> > In fact, I'm looking for something that could convert my html file
> to a plain
> > text file, so that no markup is allowed at all.
> >
> > For example, a link like that:
> >
> > <a href="http://site.example">A link</a>
> >
> > would be transformed into something like:
> >
> > A link
> > http://site.example
> >
> > I'm sure that a module doing that exists on cpan.
> >
> > Thanks,
> > Xavier
> >

There are many ways to strip out just plain text,  but it sounds like you
want control of tag data as well,  might I suggest looking at:

http://search.cpan.org/~gaas/HTML-Parser-3.55/lib/HTML/TokeParser.pm


It is fairly easy to strip out raw text and handle other tags as you see
fit.




More information about the Catalyst mailing list