on Sun, Sep 07, 2003 at 10:20:38AM +0200, Christophe Courtois ([EMAIL PROTECTED]) wrote: > Le Dimanche 7 Septembre 2003 06:19, Karsten M. Self a d?clam? : > > Well, another risk is people who use a fairly popular set of filters > > which tag as spam anything that's more than a few percent (my own > > threshold is 10%) non-roman characters, or specified in any of the > > following charsets: > > It depends with who you communicate. > > When I see a mail from an anglo-saxon name, with a subject in English, > and not already in a mailing-list folder, I'm 99% sure that it is a spam. > This filter would be far more useful for me :-)
That's something you might be able to apply Bayesian training to. The elegance of the charset filters is that it's trivial to apply a filter based on a percentage of content being in an unreadable face. Peace. -- Karsten M. Self <[EMAIL PROTECTED]> http://kmself.home.netcom.com/ What Part of "Gestalt" don't you understand? By failing to protect the public interest in free access to the products of the inventive and artistic genius -- indeed, by virtually ignoring the central purpose of the Copyright/Patent Clause [in the Constitution] -- the Court has quitclaimed to Congress its principal responsibility in this area of the law." -- Justice Stevens, J., dissenting, "Eldred v. Ashcroft"
pgp00000.pgp
Description: PGP signature