[Catalyst] HTML to plain text conversion

Brian catalyst at gzmarketing.com
Tue Jan 9 18:17:14 GMT 2007


> Unfortunately it doesn't print href attributes of links.
> I also tried HTML::Scrubber as proposed by Carl Franks, but basically it keeps 
> some tags we chose to allow.
>
> In fact, I'm looking for something that could convert my html file to a plain 
> text file, so that no markup is allowed at all.
>
> For example, a link like that:
>
> <a href="http://site.example">A link</a>
>
> would be transformed into something like:
>
> A link
> http://site.example
>
> I'm sure that a module doing that exists on cpan.
>
> Thanks,
> Xavier
>   


I'm brand spankin' new to Catalyst and haven't worked with Perl for 5 
years, so I can't give you a suggestion within the framework or CPAN.

But if you're looking at doing a one time conversion of an HTML file to 
text, you could do this using the backtick operators:

`lynx -dump http://www.urltoconverttotext.com`

Or

`lynx -dump /path/to/file.html`

and capture what is returned.

You wouldn't want to do this every time a page is viewed, however.

Brian




More information about the Catalyst mailing list