On Fri, May 30, 2003 at 01:58:40PM +0200, Sven Luther wrote:
>On Thu, May 29, 2003 at 11:53:32AM -0400, David Dawes wrote:
>> On Thu, May 29, 2003 at 07:34:28AM +0200, Sven Luther wrote:
>> >On Thu, May 29, 2003 at 12:00:22AM -0400, Mike A. Harris wrote:
>> >> On Wed, 28 May 2003, Sven Luther wrote:
>> >>
>> >> >> > I was being sarcastic, his message was encoded with koi8-r, which, along
>> >> >> > with being html, is one of the indescriminate reasons people block email
>> >> >> > (and get a good number of false positives)
>> >> >>
>> >> >> however, foreign language encoding is separate from html email.
>> >> >>
>> >> >> blocking based on foreign language encodings is not such a good idea.
>> >> >> blocking html is not so bad, though.
>> >> >
>> >> >You need to block multi-part mails with only one html part too though,
>> >> >which is not so easy to do, i think.
>> >>
>> >> This filter doesn't catch *everything*, but for the last 6 years
>> >> or so, it has had zero false positives for me while subscribed to
>> >> limitless numbers of mailing lists.
>> >>
>> >> :0:
>> >> * ^Content-Type:.*text/html
>> >> HTML
>> >
>> >Yep, i have this too, but half the html spam i get pass trough this, and
>> >because it is :
>> >
>> >Content-Type: multipart/alternative;
>> > boundary="E_BBFDE6F0B.95CA_CC.D7."
>> >...
>> >This is a multi-part message in MIME format.
>> >
>> >--E_BBFDE6F0B.95CA_CC.D7.
>> >Content-Type: text/html
>> >Content-Transfer-Encoding: quoted-printable
>> >...
>> >--E_BBFDE6F0B.95CA_CC.D7.--
>> >
>> >On the other hand i don't want to catch the emails which have a text and
>> >an html section, since they are mostly valid ones.
>>
>> The XFree86 mailing list filtering checks for a few different types of
>> html-only messages, including a few levels deep of nesting (which I've
>> seen in some spam). It does catch the occasional false-positive, but
>> it's fairly rare, and a reasonable tradeoff given its effectiveness.
>
>Are they available somewhere so i can take a look ?
No, but the Perl MIME-tools package makes it easy to break down an email
message recursively.
This is getting off-topic for this list, but here's a code snippet:
use MIME::Parser;
use MIME::WordDecoder;
...
$nparts = int($ent->parts);
if ($nparts == 0) {
$misc = $ent->head->get('content-type');
if ($misc =~ /text\/html/i) {
return "single part HTML message (1)";
}
} elsif ($nparts == 1) {
my $e = ($ent->parts)[0];
$nparts = int($e->parts);
if ($nparts == 0) {
$misc = $e->head->get('content-type');
if ($misc =~ /text\/html/i) {
return "single part HTML message (2)";
}
} elsif ($nparts == 1) {
# Maybe this should be done recursively.
my $e2 = ($e->parts)[0];
$nparts = int($e2->parts);
if ($nparts == 0) {
$misc = $e2->head->get('content-type');
if ($misc =~ /text\/html/i) {
return "single part HTML message (3)";
}
}
}
}
>> >Anyway, i have almost managed to write a sed script doing this, but i am
>> >not sure if it is possible to get the value of the boundary and match on
>> >it in the address pattern when using sed.
>>
>> If you're prepared to use perl, there are packages for breaking out the
>> mime structure.
>
>I would rather not use perl, if anything, i would write a small ocaml
>program to do it or maybe extend spamoracle which i already call. The
>execution cose per mail would be lower this way.
I used perl because there was a nice package available that took care
of the MIME parsing for me.
David
--
David Dawes
Founder/committer/developer The XFree86 Project
www.XFree86.org/~dawes
-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel