retitle 158481 mawk: Crash when using gsub
tags 158481 + upstream
thanks 

Anders Boström wrote:

> The following very short script makes mawk seg-fault:
> 
> cat foo | mawk '{ gsub(/../, "0x&,"); print }' > /dev/null
> 
> when foo is a large file with only one line (an hex-encoded
> binary). gawk works fine.

Morgon Kanter wrote:

> At least we know the problem. Seems like a stack overflow error to me.

Yes.  In case someone is interested in working on this:

gsub() was written in a bit of a quick and dirty way as far as I can
tell.  It is recursive, but it does not need to be.

Overview of current gsub():

 1. Look for a match.  No match → hoorah!
 2. Copy the replacement string.
 3. Okay, we found a match.
    IF the match was an empty match at the start of the string
    and such matches are disallowed (they are allowed for the
    first match):
      case 1: The whole string is empty.  Throw away the replacement
              string.  The modified string will be empty, too.

      case 2: The regexp to match was anchored.  Throw away the
              replacement string.  There can be no more matches;
              the modified string is the current string.

      case 3: Unanchored match.  Throw away the replacement string.
              The match was disallowed, so we have to start matching
              with the next character.  So save the first character
              from the source string in the buffer for the replacement
              string and call gsub() to deal with the rest.
    OTHERWISE (i.e., the match is not at the start of the string,
    or empty matches at start are allowed):
      a. Front consists of all characters up to the match.
      a. Figure out the value to substitute (replacing the &s
         with copies of the matched string)
      b. Call gsub() on the rest of the string.
 4. Concatenate the three pieces (front, middle, and back) and
    return the result.

So there are many fronts and middles for previous stack frames
being collected, but it would be perfectly reasonable to collect
them all in one go.  If the string to collect them is allowed to
grow quickly enough, it would avoid a lot of unnecessary copying.

Hope that helps,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to