[Catalyst] Alien::Dojo uses regexes to parse HTML, so what?

Dominique Quatravaux dom at idealx.com
Tue May 30 14:03:30 CEST 2006

Hash: SHA1

Matt S Trout wrote:

>> my ($url) =
>> qr{href="http://download.dojotoolkit.org/release[^"]+)"}sx
> <a href="http://download.dojotoolkit.org/release-notes.txt">
> Congratulations, you're toast.

This is besides the point for at least three reasons:

    * this won't happen in real life, as the maintainer of
      download.dojotoolkit.org apparently knows about directories,
    * my regex can be cured (C<< ...\.zip)"}sx >> if I must), this was
      just a throwaway example not a snippet of something that I
      intend to put in 0.02,
    * I fail to see how a full-fledged HTML parser would make any
      difference here!

> Get a canonical address from the dojo maintainers,

I am not discussing that, coz I wholeheartedly agree: this is IMO the
best thing proposed so far.

> or at the very least consider a lightweight SGMLish parsing job.

Still not sold on this one.

> Regexps are only sane for hacky one-off scripts,

I am very surprised to hear that from a top contributor to a framework
written *in Perl*: pardon me, but this particular statement just
sounds like flamebait from a Python or Ruby zealot. Hopefully there is
more to your opinion that you are willing to discuss on-list?

> at least certainly not for production use.

Despite all the respect I have for your work I simply cannot agree. I
*do* use regexes in production, sometimes even for parsing (not HTML),
they are //x, ripe with comments, they are covered by a suitable
amount of unit tests (which amounts to more than for pure-OO code),
and they just do the job.

- --
Dominique QUATRAVAUX                           Ingénieur senior
01 44 42 00 08                                 IDEALX

Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org


More information about the Catalyst mailing list