[Perl-org-patches] content and www/lean/history......

Shlomi Fish shlomif at iglu.org.il
Tue Aug 18 21:18:43 GMT 2009


On Tuesday 18 August 2009 10:23:24 Matt S Trout wrote:
> On Tue, Aug 18, 2009 at 10:16:39AM +0300, Shlomi Fish wrote:
> > I think it shouldn't be too hard to write a crawler that will traverse
> > the pages and collect all the mailing lists entries and place them into
> > an XML/YAML/etc. file which can later be converted to HTML.
>
> That would be awesome.
>
> We probably want to think about how to try and slurp stuff in from META.yml
> too (did I already say that upthread? if so, sorry).
>
> It also strikes me we'll want some way to slice and dice the lists - per
> project is one thing, general area lists another, etc.
>
> There's some "human making a decision" involved in this one and we might
> need to play with different layouts before we're happy, so maybe better
> having this sort of stuff as tags?
>
> (sorry, I'm trying to make sure said file can be extended later - I'd
> suggest JSON though, I think, just for simplicity)

Hi!

Well, true to my words I began working on a script to process the Perl 5 
Wiki's HTML and extract the mailing list info. See the 
"collect-mailing-lists-from-perl-5-wiki" mini-repos at:

https://svn.berlios.de/svnroot/repos/web-cpan/

The script is a bit hacky and rough on the edges, and it is still very 
pedantic about the format of the mailing list entries in the wiki's HTML. I 
discovered the markup of the wiki was:

<ul>

<li>Something</li>

<ul>
</ul>

<li>Item 2</li>

<ul>
</ul>
</ul>

And this needed to be handled. Right now I'm fixing the pages as I go, but I 
hope to work on this some more.

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Parody on "The Fountainhead" - http://xrl.us/bjria

God gave us two eyes and ten fingers so we will type five times as much as we
read.



More information about the Perl-org-patches mailing list