[Catalyst] Alien::Dojo uses regexes to parse HTML, so what?

Dominique Quatravaux dom at idealx.com
Mon May 29 19:14:30 CEST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

phaylon wrote:

> Dominique Quatravaux said:
>
>> I rest my case, unless someone can provide compelling reasons for
>> avoiding regexes *in general* for this task.
>
>
> mst gave only one to demonstrate the whole problem. It's like a
> big, lightsucking black hole.

No it's not. We are not trying to address the problem of parsing HTML
in general, we are trying to address the problem of parsing *one
single page*. Since I apparently have to be that explicit to make my
point, consider

  my ($url) = qr{<a ^>+
href="(http://download.dojotoolkit.org/release[^"]+)"}sx

or even

  my ($url) = qr{href="http://download.dojotoolkit.org/release[^"]+)"}sx

and pray tell me what's wrong with those. HTML is a *text* language,
for chrissake, it was designed *purposefully* so that I am able to do
that sort of thing.


- --
Dominique QUATRAVAUX                           Ingénieur senior
01 44 42 00 08                                 IDEALX

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEeyv1MJAKAU3mjcsRAlHLAJ9z+4e+CqUeZDT8FMsIpai+O/boQwCgswRU
/iA8vhOertixG59MnvIn8/s=
=K1CT
-----END PGP SIGNATURE-----





More information about the Catalyst mailing list