Ok, I appreciate all the input. Here's my thoughts now that I've
had time to digest.

  A) Use a better spam filter (either get resources to run
     a content based filter and be agressive with blacklists)

  B) Use whitelists (either at the archiving stage based on the list
     name, or at the MTA stage based on the list server IP)

  C) Make it so receiving spam is not so bad

Pat is right. Mail-Archive is an excellent candidate for running very
agressive blocklists because we only accept mail from list servers,
anyway. Blocklists are computationally cheap, easy to add, and I
already am using some anyway. Tony's counterargument about false
positives doesn't hold water, because if a list is running afoul of
blocklists, it is pretty much screwed anyway. I'm sorry if this
clobbers a few people in the corner cases, but ultimately this is
a matter of survival.

As for content filters, watching the growth of anti-filter spam in my
personal inbox makes me very skeptical that content filters will make
a big long term difference no matter what machine learning or
statistical methods are used. Despite Kir's good experience, I have a
deeply held conviction that programs have a hard enough time acting
semi-intelligent without active adversaries, and this domain is the
human's home court. I don't see enough benefit to justify anything
other than a relatively small cost. I am guessing that my personal
inbox is just a little ahead of most on the spam curve than most, and
if so, the future is not looking pretty.

CONCLUSIONS: 

[1]   Stricter blocklists at the MTA level will be implemented
      immediately.

[2]   Ignore content based filtering unless some solution magically
      drops on my lap.

The second topic is whitelists. The #1 absolute golden rule of
Mail-Archive is that I don't do any manual work. I see no way to
compile a whitelist of MTA's and observe that rule. List admins are
simply not going to know the IP of their MTA. Offloading manual header
inspection to volunteers seems like too much work, hard to get right,
and totally not fun. And latency hinging on some action by me is a
killer - I like to disappear into the wilderness every once in a
while.

Whitelisting listnames might be easier. This is what I think Stephen
and Dror are suggesting, combined with some type of CAPTCHA - whether
that is send/response based or something else. I hate the idea - the
FAQ brags about Mail-Archive not requiring registration garbage or
filling out forms. But I've accepted things in that past that were
initially distasteful, so it's too early to rule it out.

CONCLUSION:

[1]   Consider whitelists based on list name for future.

Finally, I haven't had much feedback on whether the changes I already
made substantially address the problem. Spam is NOT filling up
Mail-Archive's disks. The flood of legitimate mail far outweighs the
flood spam. It's an annoyance, and a waste of resources, but not
something that can't be handled. The main concern was that the
list-of-lists was becoming embarassing or useless due to the
preponderance of false archives caused by spam. By adjusting list of
lists to only include recent activity, I think the main threat has
been mitigated. Or am I wrong and/or forgetting about something?

Again, my goal of this discussion is to protect Mail-Archive, not
to punish spammers. Spammer Judgement Day will hopefully come
Jan 1 when California's new laws kick in and are personally
enforced by the Governator. Well, one can hope.

Conclusion:
 
[1]   Request feedback from users to see if main problem is mitigated
[2]   If so, put the majority of eggs in this basket.

Thanks, everyone.

-Jeff

_______________________________________________
Gossip mailing list
[EMAIL PROTECTED]
http://www.mail-archive.com/cgi-bin/mailman/listinfo/gossip

Reply via email to