retitle 158481 mawk: Crash when using gsub tags 158481 + upstream thanks Anders Boström wrote:
> The following very short script makes mawk seg-fault: > > cat foo | mawk '{ gsub(/../, "0x&,"); print }' > /dev/null > > when foo is a large file with only one line (an hex-encoded > binary). gawk works fine. Morgon Kanter wrote: > At least we know the problem. Seems like a stack overflow error to me. Yes. In case someone is interested in working on this: gsub() was written in a bit of a quick and dirty way as far as I can tell. It is recursive, but it does not need to be. Overview of current gsub(): 1. Look for a match. No match → hoorah! 2. Copy the replacement string. 3. Okay, we found a match. IF the match was an empty match at the start of the string and such matches are disallowed (they are allowed for the first match): case 1: The whole string is empty. Throw away the replacement string. The modified string will be empty, too. case 2: The regexp to match was anchored. Throw away the replacement string. There can be no more matches; the modified string is the current string. case 3: Unanchored match. Throw away the replacement string. The match was disallowed, so we have to start matching with the next character. So save the first character from the source string in the buffer for the replacement string and call gsub() to deal with the rest. OTHERWISE (i.e., the match is not at the start of the string, or empty matches at start are allowed): a. Front consists of all characters up to the match. a. Figure out the value to substitute (replacing the &s with copies of the matched string) b. Call gsub() on the rest of the string. 4. Concatenate the three pieces (front, middle, and back) and return the result. So there are many fronts and middles for previous stack frames being collected, but it would be perfectly reasonable to collect them all in one go. If the string to collect them is allowed to grow quickly enough, it would avoid a lot of unnecessary copying. Hope that helps, Jonathan -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org