On Wed, 21 Jun 2006, Volker Kuhlmann wrote:

There may be something to look for. I'll forward you an email that
describes a problem. I'm hoping someone can send me a sample mailbox.

Your hopes can be upheld ;)

Attached a sample mailbox, and debug output. It's my spam box (using
grepmail is nifty to check for a false positive that has gone missing),
so don't read the text too closely.

The box contains 5 messages. Search string is @orcon.net.nz, and it
occurs in msg 4 and 5, but all msgs from 2 are returned as match. (If
the box was 1000 msgs longer, they would all be returned as well.)

From my reading of the debug output, Mbox/MessageParser fails to
recognise the "^from " in msg 1 as being part of the msg body. I can say
with certainty that mutt has never failed me in a decade with separating
mbox msgs. All my emails for the past 4 years have enforced correct
content-length: headers; I don't care what XYZ or DJB says, it works
fine. Mbox/MessageParser 1.20 hasn't failed me yet either.

Well, the mailbox is not valid. The reason appears to be that antispam.rc
has truncated the mailbox in an invalid way. Namely, the multipart
boundaries have been ignored, so that the ending for:

----=_NextPart_000_0008_01C684B1.30F8FE30

Is no longer there. From RFC 1341:

"The encapsulation boundary following the last body part is a
distinguished delimiter that indicates that no further body parts will
follow. Such a delimiter is identical to the previous delimiters, with the
addition of two more hyphens at the end of the line"

In previous versions this ill-formed mailbox was not seen because I was
not parsing multi-part emails correctly. In previous versions, if an email
was part of the main multi-part email, I would incorrectly break the
multi-part email. In this case you *want* me to break the email.

I assume that pine and mutt are doing my previous incorrect behavior. (I
just checked with pine, and it breaks the email even if I put the ending
boundary marker *after* the next email.)

What I'll try to do is this:

- Look for ending boundary
- If a "^From " appears before the ending boundary is found, ignore it and
  consider the email to be a part.
- If the ending boundary is not found, consider the mailbox to be
  ill-formed. Emit a warning, back up, and search for the next "^From ".

There's a nasty performance hit for ill-formed mailboxes as the parser
searches the rest of the file for the missing boundary, but perhaps that
will be an incentive for people to fix their mailboxes. :)

Eduard, as Joey noted, your mailbox has an invalid boundary as well. My
solution above should work for your case too. I'll work on this tonight
and email you all when it's fixed.

Regards,
David

_____________________________________________________________________
David Coppit                           [EMAIL PROTECTED]
The College of William and Mary        http://coppit.org/

Single sanction punishment doesn't work for presidents or cheaters.
http://www.coppit.org/blog/archives/119


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to