tags 100808 - fixed-upstream tags 100808 + upstream thanks Jonathan Nieder wrote:
> The solution: after a legitimate match, reject anchored matches just > as if they were empty. Thomas Dickey implemented this fix in > mawk 1.3.3-20090727. Sigh. Since anchors can appear in the middle of a regex, it is not nearly so simple as that. Here’s a fixup to the patch from 1.3.3-20090727. It only addressed the problem for anchored regexps (anchored at the start) matching the empty string. The POSIX regexec() function provides a REG_NOTBOL flag for cases like this. mawk needs to expose it when using an external regex and implement it for the internal one. -- %< -- Subject: [PATCH] gsub: Fix match of anchored regexp A start anchor in first argument to gsub is being ignored in some cases: $ echo aa | mawk '{ str = $0; gsub(/^a/, "MATCH", str); print str; } MATCHMATCH To handle repeat matches, gsub runs recursively on the rest of the string. If the rest of the string starts with the anchored expression, gsub can make spurious matches. So check for this case and reject the matches. The same problem could occur if an _unanchored_ regexp matches the empty string: after a legitimate match, gsub would run recursively on the rest of the string and match again. Luckily, Mike Brennan already handles this and has skipped those matches since version 0.97 (maybe even earlier). ;-) The solution: after a legitimate match, reject anchored matches just as if they were empty. This was half-implemented in mawk 1.3.3-20090727, but the fix only applied to anchored _empty_ matches. Even this patch is not enough to handle such complicated examples such as echo aa | mawk '{ str = $0; gsub(/(^a)/, "MATCH", str); print str; } --- bi_funct.c | 9 +++++---- test/mawktest | 18 ++++++++++++++++++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/bi_funct.c b/bi_funct.c index 0f664fa..25ff59c 100644 --- a/bi_funct.c +++ b/bi_funct.c @@ -914,16 +914,17 @@ gsub(PTR re, CELL * repl, char *target, unsigned target_len, int flag) cellcpy(&xrepl, repl); - if (!flag && middle_len == 0 && middle == target) { + if (!flag && middle == target && isAnchored(re)) { + /* False anchored match. */ + repl_destroy(&xrepl); + return new_STRING1(target, target_len); + } else if (!flag && middle_len == 0 && middle == target) { /* match at front that's not allowed */ if (*target == 0) { /* target is empty string */ repl_destroy(&xrepl); null_str.ref_cnt++; return &null_str; - } else if (1 && isAnchored(re)) { - repl_destroy(&xrepl); - return new_STRING1(target, target_len); } else { char xbuff[2]; diff --git a/test/mawktest b/test/mawktest index af75a01..5c266fa 100755 --- a/test/mawktest +++ b/test/mawktest @@ -197,6 +197,24 @@ Finish "array test" ################################# +Begin "testing built-in functions" + +echo aa | LC_ALL=C $PROG '{str=$0; gsub(/^ */, "MATCH", str); print str}' > $STDOUT +echo MATCHaa | cmp -s - $STDOUT || Fail "empty match at start" + +echo aa | LC_ALL=C $PROG '{str=$0; gsub(/^a/, "MATCH", str); print str}' > $STDOUT +echo MATCHa | cmp -s - $STDOUT || Fail "anchored match" + +echo aa | LC_ALL=C $PROG '{str=$0; gsub(/a/, "MATCH", str); print str}' > $STDOUT +echo MATCHMATCH | cmp -s - $STDOUT || Fail "non-anchored match" + +echo abb | LC_ALL=C $PROG '{str=$0; gsub(/a?/, "MATCH", str); print str}' > $STDOUT +echo MATCHbMATCHbMATCH | cmp -s - $STDOUT || Fail "empty match" + +Finish "testing built-in functions" + +################################# + Begin "testing function calls and general stress test" LC_ALL=C $PROG -f $SRC/examples/decl.awk $dat | cmp -s - decl-awk.out || Fail -- debian.1.7.0.1.96.ged606 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org