[Bast-commits] r9141 - ironman/branches/mk-ii/IronMan-Web
idn at dev.catalyst.perl.org
idn at dev.catalyst.perl.org
Wed Apr 14 12:01:58 GMT 2010
Author: idn
Date: 2010-04-14 13:01:58 +0100 (Wed, 14 Apr 2010)
New Revision: 9141
Modified:
ironman/branches/mk-ii/IronMan-Web/todo.pod
Log:
Spam handling discussions
Modified: ironman/branches/mk-ii/IronMan-Web/todo.pod
===================================================================
--- ironman/branches/mk-ii/IronMan-Web/todo.pod 2010-04-13 23:35:32 UTC (rev 9140)
+++ ironman/branches/mk-ii/IronMan-Web/todo.pod 2010-04-14 12:01:58 UTC (rev 9141)
@@ -98,8 +98,103 @@
This should probably be by feed or post URL.
+From IRC discussion:
+
+ 12:55 < mst> for the live one it's easy
+ 12:55 < mst> just cp the sqlite db first
+ 12:55 < castaway> on ironman I always do: login, cd plagger, cp subscriptions.db scubscriptions_pewdespam.db; sqlite3 subscriptions.db
+ 12:56 < castaway> any mess, re copy and start again ;)
+ 12:56 < castaway> backups++
+
=item * A name should not appear on the index page until at least one post has been made.
=back
+Note from robinsmidsrod in #epo-ironman:
+
+ 12:20 < robinsmidsrod> idn: I just wanted to suggest to you to use the http_bl support from https://www.projecthoneypot.org to reduce spam entering the ironman database - I've
+ successfully used it on my blog - now I barely have spam entering my blog, and I don't have a captcha installed
+ 12:21 < robinsmidsrod> I used the mod_httpbl apache implementation from https://www.projecthoneypot.org/httpbl_implementations.php
+ 12:21 < idn> robinsmidsrod: Thanks for the suggestion, I'll stick it in the todo list for investigation.
+ 12:23 < robinsmidsrod> sorry, my bad, I didn't use the mod_httpbl module - I actually used a b2evolution plugin - but projecthoneypot has a simple DNS-based API, so it shouldn't be to
+ hard to make a perl module to handle it
+ 12:23 < robinsmidsrod> it works just as any other DNS-based blacklist
+ 12:24 < robinsmidsrod> but the cool thing is that you can actually choose the threshold level for when you will block users
+
+Discussion followed:
+
+ 12:24 < idn> That's an interesting idea that I hadn't thought of.
+ 12:25 < idn> How would you look to implement it, at collection time or at signup time?
+ 12:25 < robinsmidsrod> actually, if you run your own DNS server (and http server) I would suggest to support the project - it is an awesome project (I've been a member for a bit over a
+ year)
+ 12:25 < robinsmidsrod> the new site is dynamic, right?
+ 12:25 < idn> Yes
+ 12:26 < idn> Hmm, I'd love to support the project, but my employer isn't very community minded or responsible in that respect.
+ 12:26 < robinsmidsrod> so just look up REMOTE_IP, do a lookup against projecthoneypot BL (via DNS) and check the response - if it shows something that looks bad, just block or redirect
+ the user to a page that explains the problem, or enable captcha
+ 12:26 < robinsmidsrod> idn: well, I support it with my private stuff
+ 12:27 < robinsmidsrod> anyone can donate a spare MX pointer ;)
+ 12:27 < robinsmidsrod> for some hostname you would probably never use
+ 12:27 < idn> Ah, I'm with you.
+ 12:28 < robinsmidsrod> mine is XXXmailserver.smidsrod.no which points to a honeypot
+ 12:29 < idn> So all I need is some domains rather than any actual kit ;) All of mine are on 123reg which neatly solves that problem.
+ 12:30 < robinsmidsrod> me alone have helped catching approx. 20 harvesters and spammers in the last year
+ 12:30 < robinsmidsrod> which is nice to know :)
+ 12:31 < robinsmidsrod> you can support them by donating an MX entry in your DNS, setting up a "hidden" link on your own websites linking to a honeypot, or you can setup an actual
+ honeypot - I've only done the two first
+ 12:32 < idn> That seems like a good idea, I'll put it forward to the boss too and see if we can't do something here at work.
+ 12:32 < robinsmidsrod> what I do in my blog is that if the remote_ip looks suspicies I redirect the user to my honeypot page, which means that those IPs that are already somewhat fishy
+ will be redirected to something that will make them more fishy if they harvest it :)
+ 12:33 < idn> I had been contemplating using spam assassin to scan content too
+ 12:34 < robinsmidsrod> idn: this is my honeypot page: http://minmailserver.smidsrod.no/
+ 12:34 < robinsmidsrod> if you look at the HTML content you'll see that there are some hidden links that harvesters will catch
+ 12:35 < idn> There are a couple of problems to address, one is the existing bad feeds (most of which don't appear because they don't use the right keywords) and secondly preventing new
+ bad feeds.
+ 12:36 < idn> The former is more of a problem due to the way in which the list of signed up users appears on the front page.
+ 12:36 < robinsmidsrod> honeypot will only help with the new bad feeds
+ 12:36 < idn> Possibly, the site hosting the spam might well be listed
+ 12:37 * mst still thinks "only appears in the right bar if they've got at least one post" would be a start
+ 12:37 < robinsmidsrod> if you have any IP-adresses linked with existing content you could of course manually run it throuh their BL and see what you find
+ 12:37 < robinsmidsrod> mst: I agree with that one
+ 12:37 < mst> actual spam posts will get nailed pretty quickly, I think
+ 12:37 < idn> mst: Yes, that's what's in the todo I think
+ 12:38 < robinsmidsrod> a "report spam" feature is available?
+ 12:38 < castaway> robinsmidsrod: no but that'd be handy, care to write one?
+ 12:38 < idn> I've thought about that and I've mixed opinions. It seems like it could be open to abuse and or create work for someone to deal with.
+ 12:38 < castaway> idn: we run a website, its gonna create work ;
+ 12:38 < castaway> ;)
+ 12:39 < idn> Yes. But I like to try my best to minimise that ;)
+ 12:39 < robinsmidsrod> castaway: I don't have any time available, 200% workload with work + full time studies, but I can explain how it could be created to mitigate moderator
+ intervention.
+ 12:40 < idn> I was contemplating some kind of scoring system that would blacklist a feed once so many reports have been received from different requesting hosts, but it's still a
+ little open to abuse. Coupled with administrative notification and oversight to re-enable if needed.
+ 12:40 < robinsmidsrod> castaway: create a form (POST) with a "Report spam" button so that behaving robots won't access it. When enough people have clicked that button the article will
+ be blacklisted until a moderator actually clears it from blacklist
+ 12:40 < castaway> idn: like, say, bayes? ;)
+ 12:41 < robinsmidsrod> idn: exactly the same as I thought
+ 12:41 < castaway> robinsmidsrod: makes sense.. (user moderation, yay)
+ 12:41 < robinsmidsrod> that way either the author needs to complain to a moderator that his post doesn't show up
+ 12:41 < idn> castaway: Erm, not quite, though my understanding of things statistical could be written on the back of a very small pin head....
+ 12:41 < robinsmidsrod> because it got blacklisted
+ 12:42 < idn> That wouldn't work in that each and every time the feed generated a spam, it would need to be black listed. Though I like where you're going.
+ 12:42 < robinsmidsrod> mst: do you have a suggestion on how to calculate how many reports should cause blacklisting to be triggered?
+ 12:42 < idn> Blacklist the individual post with some kind of hysteresis, then blacklist the feed once enough posts have been blacklisted.
+ 12:42 < mst> idn: er, what?
+ 12:43 < mst> why would you have to regen?
+ 12:43 < robinsmidsrod> idn: or enable reporting spam on both posts and feeds
+ 12:43 < mst> oh, each time. yes.
+ 12:43 < mst> idn: that's simple.
+ 12:43 < mst> two blacklists and the feed goes.
+ 12:43 < idn> There we go then :)
+ 12:44 < robinsmidsrod> I'd suggest to put the feed in a quarantine so that it is easy for a moderator to un-blacklist feeds - and once a feed has been un-blacklisted you would increase
+ the blacklist threshold
+ 12:45 < robinsmidsrod> sometimes member blogs get hijacked and start generating spam, but I guess that problem is much smaller than spammer blogs in general
+ 12:46 < idn> I wouldn't remove the feed, they could just sign it up again. I'd opt for blacklisting it and never collecting it again
+ 12:47 < robinsmidsrod> if the feed has been in the blacklist for , let's say a month or two, it will be automatically purged from the database
+ 12:47 < robinsmidsrod> what you said makes more sense yes :)
+ 12:47 < robinsmidsrod> and if they try to signup again you redirect them to a projecthoneypot page :)
+ 12:49 < robinsmidsrod> just make sure the report spam button is a form/post button, not a link, or else you'll gather report spam-reports en masse when misbehaving robots come in
+ 12:51 < idn> Hmm, that's a good place to use the honeypot stuff too.
+ 12:52 < robinsmidsrod> catch the spammers/harvesters by their own bad behaviour :)
+
=cut
More information about the Bast-commits
mailing list