tags 100808 - fixed-upstream
tags 100808 + upstream
thanks

Jonathan Nieder wrote:

> The solution: after a legitimate match, reject anchored matches just
> as if they were empty.  Thomas Dickey implemented this fix in
> mawk 1.3.3-20090727.

Sigh.  Since anchors can appear in the middle of a regex, it is not
nearly so simple as that.

Here’s a fixup to the patch from 1.3.3-20090727.  It only addressed
the problem for anchored regexps (anchored at the start) matching the
empty string.

The POSIX regexec() function provides a REG_NOTBOL flag for cases
like this.  mawk needs to expose it when using an external regex
and implement it for the internal one.

-- %< --
Subject: [PATCH] gsub: Fix match of anchored regexp

A start anchor in first argument to gsub is being ignored in some
cases:

 $ echo aa | mawk '{ str = $0; gsub(/^a/, "MATCH", str); print str; }
 MATCHMATCH

To handle repeat matches, gsub runs recursively on the rest of the
string.  If the rest of the string starts with the anchored
expression, gsub can make spurious matches.  So check for this case
and reject the matches.

The same problem could occur if an _unanchored_ regexp matches the
empty string: after a legitimate match, gsub would run recursively on
the rest of the string and match again.  Luckily, Mike Brennan already
handles this and has skipped those matches since version 0.97 (maybe
even earlier). ;-)

The solution: after a legitimate match, reject anchored matches just
as if they were empty.  This was half-implemented in mawk
1.3.3-20090727, but the fix only applied to anchored _empty_ matches.

Even this patch is not enough to handle such complicated examples such
as

 echo aa | mawk '{ str = $0; gsub(/(^a)/, "MATCH", str); print str; }
---
 bi_funct.c    |    9 +++++----
 test/mawktest |   18 ++++++++++++++++++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/bi_funct.c b/bi_funct.c
index 0f664fa..25ff59c 100644
--- a/bi_funct.c
+++ b/bi_funct.c
@@ -914,16 +914,17 @@ gsub(PTR re, CELL * repl, char *target, unsigned 
target_len, int flag)
 
     cellcpy(&xrepl, repl);
 
-    if (!flag && middle_len == 0 && middle == target) {
+    if (!flag && middle == target && isAnchored(re)) {
+       /* False anchored match. */
+       repl_destroy(&xrepl);
+       return new_STRING1(target, target_len);
+    } else if (!flag && middle_len == 0 && middle == target) {
        /* match at front that's not allowed */
 
        if (*target == 0) {     /* target is empty string */
            repl_destroy(&xrepl);
            null_str.ref_cnt++;
            return &null_str;
-       } else if (1 && isAnchored(re)) {
-           repl_destroy(&xrepl);
-           return new_STRING1(target, target_len);
        } else {
            char xbuff[2];
 
diff --git a/test/mawktest b/test/mawktest
index af75a01..5c266fa 100755
--- a/test/mawktest
+++ b/test/mawktest
@@ -197,6 +197,24 @@ Finish "array test"
 
 #################################
 
+Begin "testing built-in functions"
+
+echo aa | LC_ALL=C $PROG '{str=$0; gsub(/^ */, "MATCH", str); print str}' > 
$STDOUT
+echo MATCHaa | cmp -s - $STDOUT || Fail "empty match at start"
+
+echo aa | LC_ALL=C $PROG '{str=$0; gsub(/^a/, "MATCH", str); print str}' > 
$STDOUT
+echo MATCHa | cmp -s - $STDOUT || Fail "anchored match"
+
+echo aa | LC_ALL=C $PROG '{str=$0; gsub(/a/, "MATCH", str); print str}' > 
$STDOUT
+echo MATCHMATCH | cmp -s - $STDOUT || Fail "non-anchored match"
+
+echo abb | LC_ALL=C $PROG '{str=$0; gsub(/a?/, "MATCH", str); print str}' > 
$STDOUT
+echo MATCHbMATCHbMATCH | cmp -s - $STDOUT || Fail "empty match"
+
+Finish "testing built-in functions"
+
+#################################
+
 Begin "testing function calls and general stress test"
 
 LC_ALL=C $PROG -f $SRC/examples/decl.awk $dat | cmp -s - decl-awk.out || Fail
-- 
debian.1.7.0.1.96.ged606




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to