On Thu, Jan 8, 2015 at 3:12 PM, Kent R. Spillner <kspill...@acm.org> wrote: > As the quilt maintainer I was surprised to see Landry mention quilt on > icb as a port that depends on procmail; I didn't even realize that quilt > used procmail behind the scenes. :) > > Turns out, quilt really only uses formail, and only to extract the value > of headers. Reading RFC 2822 I think we can safely do this with sed, > instead. (Well, technically gsed in this case; in fact, testing this > exposed a few issues in our sed that I will investigate separately) > > I'd really appreciate some extra eyeballs on this new patch. quilt's > mail regress test looks pretty thorough and it still passes, and I also > manually verified the sed output matches formail's output. Cc'ing > guenther@ in case he's got a spare minute or two to look this over and > chime in.
I think sed is the wrong hammer for this, unfortunately, as doing case-insensitive matching on the header field name per-rfc2822 (well 5322 now) is a royal pain here: I think you would need to use the hold space, at which point it's a write-only sed script. > ++# Extract RFC 2822 compliant header values, including Long Header Fields, > ++# from messages > ++ > ++extract_header_value() > ++{ > ++ local header=$1 > ++ > ++ # Long Header Fields may span multiple lines, in which case CRLF > ++ # is followed by space or tab (RFC 2822) > ++ sed -n "/^${header}/,/^[^ \\t]/ { /^${header}/ { p; n; }; /^[^ > \\t]/q; /^$/q; p; }" | sed "s/^${header}//" > ++} So instead, for this, I think awk works better: extract_header_value() { awk -F: -v h=$1 \ 'BEGIN { h = "^" tolower(h) "[[:space:]]*$" ; } /^[[:space:]]/ { if (matched) print; next; } { if (matched) exit; if (!match(tolower($1), h)) next; matched = 1; sub(/^[^:]*:[[:space:]]*/, ""); print; }' } I have the final sub() there strip the leading whitespace from the header field value, so in the rest of the code it no longer needs the separate var=${var# } assignment. Note also that this version extracts the value of just the first occurrence of the given header field, instead of extracting all of them and concatenating the value like the current code using formail does. If the current behavior is desired then change the awk script to: awk -F: -v h=$1 \ 'BEGIN { h = "^" tolower(h) "[[:space:]]*$" ; } /^[[:space:]]/ { if (matched) print; next; } { matched = 0; if (!match(tolower($1), h)) next; matched = 1; sub(/^[^:]*:[[:space:]]*/, ""); print; }' Philip Guenther