On Fri, Feb 08, 2002 at 12:48:55AM -0800, Blars Blarson wrote: | In article <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes: | [spamassassin] | >| The default rule scoring seems pretty far off to me though. | >Can you expand on this? | | (These comments are based on the few dozen mainly spam messages I've fed | to "spamassassin -t", and some reading of the spamassassin mailing list | archives.) | | Low scores for some obvious spam-only indicators (javascript -- no | valid mail will ever contain javascript)
That's a good point. | Any html is a strong spam indicator. Depends on the user. Some groups of people tend to use html or both. (not that I condone it) | High scores for some things that could easily be tripped by valid email. | (common spam phrases) The spam phrases were messed up in 2.01. A typo (or a thinko) in one of the arithmatic expressions. | Negative score for long messages. Long messages are more likely to be | spam, not less. Depends on the context of the message. Some people will write a lot. A newsletter could be long. Including lots of log messages or system details can be long. | The current auto-whitelist implementation seems to have some problems. Yep. It's going to be fixed in 2.02. (that's what's holding back the release) | I haven't yet figured out how to configure which DNSBLs are used. I don't know how easy that is. In the config file it shows that a perl function is called to test those. Perhaps a perl function needs to be made for each one? | It only seems to catch about 60% of the spam that gets past my other | filters. (ordb, osirusoft, blarsbl, valid rDNS of relay, valid domain | in envelope from) (These catch about 90% of the spam, and an occasional | valid email.) | | I think most of these problems stem from their mail base their scores | are based on being very different from the mail I receive. This is the likeliest cause of your problems with it. If you can build your own corpus then you can run the GA yourself and get default scores tailored for your mail usage. One issue the developers face is coming up with a corpus and scores that work for everyone. For geeks, any message talking about making money is probably spam. For management of a company (whose mail admin runs SA), newsletters, etc, discussing markets and money making are desired, rather than spam. About the only thing I can suggest for you, if you really want to give SA another chance, is to customize the scores for your usage. -D -- An anxious heart weighs a man down, but a kind word cheers him up. Proverbs 12:25