Your message dated Mon, 8 May 2006 16:37:47 -0400 with message-id <[EMAIL PROTECTED]> has caused the Debian Bug report #365151, regarding libmail-mbox-messageparser-perl: message splitting breaks to be marked as having been forwarded to the upstream software author(s) Eduard Bloch <[EMAIL PROTECTED]>, [EMAIL PROTECTED], David Coppit <[EMAIL PROTECTED]>.
(NB: If you are a system administrator and have no idea what I am talking about this indicates a serious mail system misconfiguration somewhere. Please contact me immediately.) Debian bug tracking system administrator (administrator, Debian Bugs database)
--- Begin Message ---Eduard Bloch wrote: > my program mail-expire uses this module to split mbox files into > individual messages. Sometimes, however, the end of file is reported too > early and data is _lost_ because of that. I did not try to investigate > the issue yet, test data is in: > http://people.debian.org/~blade/debian-user-german.Apr_2006.bz2 > and the current version of the script is attached, with debugging output > enabled. If you look at that, it stops splitting the contents at <[EMAIL > PROTECTED]> and returns the rest as one big message. Looks like the problem here is the mime boundary header parsing. The header looks like this: Content-Type: multipart/signed; boundary=Sig_vBdOhvW1OXTFVp5Uz7Tcu_+; protocol="application/pgp-signature"; micalg=PGP-SHA1 Note the lack of quotation of the boundary string. The library parses it with this: # Are nonquoted parameter values allowed to have spaces? I assume not. if ($content_type_header =~ /boundary *= *"([^"]*)"/i || $content_type_header =~ /boundary *= *\b(\S+)\b/i) This matches "Sig_vBdOhvW1OXTFVp5Uz7Tcu_" out of the string, leaving off the "+" at the end. This doesn't conform to RFC 2046 which allows boundary to contain: boundary := 0*69<bchars> bcharsnospace bchars := bcharsnospace / " " bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / "_" / "," / "-" / "." / "/" / ":" / "=" / "?" (And yes, even nonquoted spaces are legal AFAICS..) This should work better, it passes the test suite and successfully parses the mailbox from this bug report. Index: Grep.pm =================================================================== --- Grep.pm (revision 12420) +++ Grep.pm (working copy) @@ -177,9 +177,8 @@ my $content_type_header = $1; $content_type_header =~ s/$endline//g; - # Are nonquoted parameter values allowed to have spaces? I assume not. if ($content_type_header =~ /boundary *= *"([^"]*)"/i || - $content_type_header =~ /boundary *= *\b(\S+)\b/i) + $content_type_header =~ /boundary *= *([-0-9A-Za-z'()+_,.\/:=? ]*[-0-9A-Za-z'()+_,.\/:=?])/i) { return $1 } Index: Perl.pm =================================================================== --- Perl.pm (revision 12420) +++ Perl.pm (working copy) @@ -248,9 +248,8 @@ my $content_type_header = $1; $content_type_header =~ s/$endline//g; - # Are nonquoted parameter values allowed to have spaces? I assume not. if ($content_type_header =~ /boundary *= *"([^"]*)"/i || - $content_type_header =~ /boundary *= *\b(\S+)\b/i) + $content_type_header =~ /boundary *= *([-0-9A-Za-z'()+_,.\/:=? ]*[-0-9A-Za-z'()+_,.\/:=?])/i) { return $1 } -- see shy jo
signature.asc
Description: Digital signature
--- End Message ---