On Thu, Jan 8, 2015 at 3:12 PM, Kent R. Spillner <kspill...@acm.org> wrote:
> As the quilt maintainer I was surprised to see Landry mention quilt on
> icb as a port that depends on procmail; I didn't even realize that quilt
> used procmail behind the scenes.  :)
>
> Turns out, quilt really only uses formail, and only to extract the value
> of headers.  Reading RFC 2822 I think we can safely do this with sed,
> instead.  (Well, technically gsed in this case; in fact, testing this
> exposed a few issues in our sed that I will investigate separately)
>
> I'd really appreciate some extra eyeballs on this new patch.  quilt's
> mail regress test looks pretty thorough and it still passes, and I also
> manually verified the sed output matches formail's output.  Cc'ing
> guenther@ in case he's got a spare minute or two to look this over and
> chime in.

I think sed is the wrong hammer for this, unfortunately, as doing
case-insensitive matching on the header field name per-rfc2822 (well
5322 now) is a royal pain here: I think you would need to use the hold
space, at which point it's a write-only sed script.


> ++# Extract RFC 2822 compliant header values, including Long Header Fields,
> ++# from messages
> ++
> ++extract_header_value()
> ++{
> ++      local header=$1
> ++
> ++      # Long Header Fields may span multiple lines, in which case CRLF
> ++      # is followed by space or tab (RFC 2822)
> ++      sed -n "/^${header}/,/^[^ \\t]/ { /^${header}/ { p; n; }; /^[^ 
> \\t]/q; /^$/q; p; }" | sed "s/^${header}//"
> ++}

So instead, for this, I think awk works better:

extract_header_value()
{
    awk -F: -v h=$1 \
        'BEGIN { h = "^" tolower(h) "[[:space:]]*$" ; }
        /^[[:space:]]/ { if (matched) print; next; }
        { if (matched) exit; if (!match(tolower($1), h)) next;
          matched = 1; sub(/^[^:]*:[[:space:]]*/, ""); print; }'
}

I have the final sub() there strip the leading whitespace from the
header field value, so in the rest of the code it no longer needs the
separate
    var=${var# }
assignment.

Note also that this version extracts the value of just the first
occurrence of the given header field, instead of extracting all of
them and concatenating the value like the current code using formail
does.  If the current behavior is desired then change the awk script
to:

    awk -F: -v h=$1 \
        'BEGIN { h = "^" tolower(h) "[[:space:]]*$" ; }
        /^[[:space:]]/ { if (matched) print; next; }
        { matched = 0; if (!match(tolower($1), h)) next;
          matched = 1; sub(/^[^:]*:[[:space:]]*/, ""); print; }'


Philip Guenther

Reply via email to