[Catalyst] Alien::Dojo uses regexes to parse HTML, so what?

Daniel McBrearty danielmcbrearty at gmail.com
Mon May 29 21:09:31 CEST 2006

> Well, *I am* a software engineer, and KISS (Keep It Simple, Stupid) is
> among the things I advocate. I hear I'm not alone at that.

Well, the accepted wisdom in both software and other fields of engineering
has been there for a long time : use components, except where there is a
*really* good reason not to. Components improve reliability. Components
reduce headaches.

You might see your solution as being more simple, but I doubt that someone
using your module would feel the same. Regex's are great tools when you need
to work on arbitrary text; but this is not arbitrary , this is HTML. If I
had to look at or maintain your code in future, I'd consider a reference to
a module with a documented API than a regex. It's, erm, simpler. And the
fact that it is a *component* makes it more trustworthy if you suspect a

> In the current instance (repeat: *in the current instance*), using an
> HTML parser would just result in gratuitous bloat

in Foo::Dojo's tarball,

By the time you are using a framework like catalyst, you are not that likely
to care about adding one more module either way. Just to add a feature like
session authentication, you typically add 3 or 4; if you choose a common
one, there is a reasonable chance that it will already be used somewhere
anyway on a site of any complexity.

And have you actually quantified this? Many modules are surprisingly small.

maintenance headaches

Why? There are a lot more people involved in the design of whatever CPAN
parser you use, and ... guess what! It's already been used in hundreds of
other programs out there!

> TIMTOWTDI, remember?

Yes. But that doesn't mean that all ways to do it are equal. I think a regex
would be fine for a script that is only for your own use, or a throwaway
tool. But for a module for public submission? It just doesn't make sense.

>And as I argued in the other
>email I don't expect an infinite stream of bugs to pour from that
>regex thingy of mine, either.

Well, maybe you *do* only ever find bugs in places where you expected to
them ...

Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rawmode.org/pipermail/catalyst/attachments/20060529/2ea510b2/attachment-0001.htm 

More information about the Catalyst mailing list