On Sun, Jan 16, 2005 at 12:25:07PM -0800, Jefferson Cowart wrote: > Just to chime in a note on this topic. The default scores (as distributed by > upstream) and dynamically generated by having the rules analyze corpuses on > known spam and known ham. Based on which rules match on the SPAM and on the > HAM the scores are computed in a way to minimize the number of false > positives and false negatives. This means that if a score ends up with a > high positive score that it is a good indicator for SPAM (at least based on > the corpus it was run against). The opposite is true about large negative > scores and HAM. > > For more information check out > http://wiki.apache.org/spamassassin/HowScoresAreAssigned
Thank you, Jefferson. Mathieu, I hope this will be my last message to this bug report. Please visit the link above. Furthermore, I want to show you the following statistics from /usr/share/doc/spamassassin/rules/STATISTICS-set1.txt.gz (set 1 is no Bayes, with network tests) OVERALL% SPAM% HAM% S/O RANK SCORE NAME 766289 506205 260084 0.661 0.00 0.00 (all messages) 100.000 66.0593 33.9407 0.661 0.00 0.00 (all messages as %) 30.961 46.6673 0.3922 0.992 0.48 0.14 RCVD_IN_SORBS_DUL 31.331 47.2168 0.4133 0.991 0.47 1.66 RCVD_IN_NJABL_DUL RCVD_IN_SORBS_DUL hits 47% of spam messages and 0.4% of ham messages (based on our test Corpora). When RCVD_IN_NJABL_DUL is hit, there is statistically a 99.1% chance that it is spam. The perceptron assigned it a score of 1.7, and there's absolutely no way I'm going to change it in the default Debian distribution. We're not discriminating, we're just using statistics. We don't make these scores up off the top of our heads. If you feel obliged to respond to this mail, don't hold your breath for a response, you likely won't get one. You have the right to appeal to the Technical Committee if you so desire. I'm done dicussing this. -- Duncan Findlay
signature.asc
Description: Digital signature