Ok, I appreciate all the input. Here's my thoughts now that I've had time to digest.
A) Use a better spam filter (either get resources to run a content based filter and be agressive with blacklists) B) Use whitelists (either at the archiving stage based on the list name, or at the MTA stage based on the list server IP) C) Make it so receiving spam is not so bad Pat is right. Mail-Archive is an excellent candidate for running very agressive blocklists because we only accept mail from list servers, anyway. Blocklists are computationally cheap, easy to add, and I already am using some anyway. Tony's counterargument about false positives doesn't hold water, because if a list is running afoul of blocklists, it is pretty much screwed anyway. I'm sorry if this clobbers a few people in the corner cases, but ultimately this is a matter of survival. As for content filters, watching the growth of anti-filter spam in my personal inbox makes me very skeptical that content filters will make a big long term difference no matter what machine learning or statistical methods are used. Despite Kir's good experience, I have a deeply held conviction that programs have a hard enough time acting semi-intelligent without active adversaries, and this domain is the human's home court. I don't see enough benefit to justify anything other than a relatively small cost. I am guessing that my personal inbox is just a little ahead of most on the spam curve than most, and if so, the future is not looking pretty. CONCLUSIONS: [1] Stricter blocklists at the MTA level will be implemented immediately. [2] Ignore content based filtering unless some solution magically drops on my lap. The second topic is whitelists. The #1 absolute golden rule of Mail-Archive is that I don't do any manual work. I see no way to compile a whitelist of MTA's and observe that rule. List admins are simply not going to know the IP of their MTA. Offloading manual header inspection to volunteers seems like too much work, hard to get right, and totally not fun. And latency hinging on some action by me is a killer - I like to disappear into the wilderness every once in a while. Whitelisting listnames might be easier. This is what I think Stephen and Dror are suggesting, combined with some type of CAPTCHA - whether that is send/response based or something else. I hate the idea - the FAQ brags about Mail-Archive not requiring registration garbage or filling out forms. But I've accepted things in that past that were initially distasteful, so it's too early to rule it out. CONCLUSION: [1] Consider whitelists based on list name for future. Finally, I haven't had much feedback on whether the changes I already made substantially address the problem. Spam is NOT filling up Mail-Archive's disks. The flood of legitimate mail far outweighs the flood spam. It's an annoyance, and a waste of resources, but not something that can't be handled. The main concern was that the list-of-lists was becoming embarassing or useless due to the preponderance of false archives caused by spam. By adjusting list of lists to only include recent activity, I think the main threat has been mitigated. Or am I wrong and/or forgetting about something? Again, my goal of this discussion is to protect Mail-Archive, not to punish spammers. Spammer Judgement Day will hopefully come Jan 1 when California's new laws kick in and are personally enforced by the Governator. Well, one can hope. Conclusion: [1] Request feedback from users to see if main problem is mitigated [2] If so, put the majority of eggs in this basket. Thanks, everyone. -Jeff _______________________________________________ Gossip mailing list [EMAIL PROTECTED] http://www.mail-archive.com/cgi-bin/mailman/listinfo/gossip