On Fri, Apr 02, 2021 at 01:57:07PM +0200, Otto Moerbeek wrote:

> On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:
> 
> > > Synopsis:      Basic Regular Expression (BRE) bug in \{m,n\} with \(\) 
> > > and \n
> > > Category:      library
> > > Environment:
> >         System      : OpenBSD 6.7
> >         Details     : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
> > 
> >         Architecture: OpenBSD.amd64
> >         Machine     : amd64
> > > Description:
> >         Certain BRE expressions fail/misbehave unexpectedly.
> >         The failures are the same in both grep and sed (without -E).
> >         The failures only occur with certain combinations of use of:
> >         \{\}, \(\), \n (where n is digit) syntax, dropping any one
> >         of those then generally fails to trigger the bug.
> >         The bug/error can be seen most clearly in unexpected
> >         behavior of the \{m,n\} portion in the given context.
> >         If more of the (apparently dependent) context is removed,
> >         the bug doesn't show up.  E.g. some of the clearest cases
> >         involve replacing * with \{0,\} in the BRE, and getting
> >         quite unexpected results (one would expect the results
> >         to be the same).  These same BREs work under both
> >         Solaris 11 and GNU/Linux with their sed and grep.
> > > How-To-Repeat:
> >         This example code can be used to illustrate the problem,
> >         and both show cases where the bug shows up, and also slightly
> >         differing contexts where the bug does not occur.
> >         In each of these cases, the output should be the STRING
> >         we set/echo into grep/sed where we use our BRE, but in the bug
> >         cases we get no output.
> >         It's also suggested test cases be added to the code to catch
> >         possible regression bugs, should issue recur.  :-)
> >         Example code to show where bug does (and doesn't) show up:
> >         (
> >                 exec 2>&1
> >                 set -- \
> >                         'YYxx' 'Y*\(x\)\1' \
> >                         'YYxx' 'Y\{0,\}\(x\)\1' \
> >                         'YYxx' 'Y\{2,\}\(x\)\1' \
> >                         'YYxx' 'Y\{0,\}\(x\)' \
> >                         'YYxx' 'Y\{2,\}x' \
> >                         'YYxx' 'Y\{2,\}x\{1,\}' \
> >                         'YYxx' 'Y\{2,\}x\{0,\}' \
> >                         'YYxxz' 'Y\{2,\}x\{0,\}z' \
> >                         'YYxxz' 'Y\{0,\}x\{0,\}z' \
> >                         'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> >                         'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> >                         'YYxyxy' 'Y*\(xy\)\1' \
> >                         'YYxyxy' 'Y\{0,\}\(xy\)xy'
> >                 while [ "$#" -ge 2 ]
> >                 do
> >                         STRING="$1"; shift; BRE="$1"; shift
> >                         set -x
> >                         echo "$STRING" | grep -e "$BRE"
> >                         echo "$STRING" | sed -ne "s/$BRE/&/p"
> >                         set +x
> >                 done
> >         )
> >         Example run of above code.  Bug is present where our
> >         STRING echoed into grep/sed fails to appear in the
> >         output:
> >         + echo YYxx
> >         + grep -e Y*\(x\)\1
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y*\(x\)\1/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{0,\}\(x\)\1
> >         + echo YYxx
> >         + sed -ne s/Y\{0,\}\(x\)\1/&/p
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}\(x\)\1
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}\(x\)\1/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{0,\}\(x\)
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{0,\}\(x\)/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}x
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}x/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}x\{1,\}
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}x\{1,\}/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}x\{0,\}
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}x\{0,\}/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxxz
> >         + grep -e Y\{2,\}x\{0,\}z
> >         YYxxz
> >         + echo YYxxz
> >         + sed -ne s/Y\{2,\}x\{0,\}z/&/p
> >         YYxxz
> >         + set +x
> >         + echo YYxxz
> >         + grep -e Y\{0,\}x\{0,\}z
> >         YYxxz
> >         + echo YYxxz
> >         + sed -ne s/Y\{0,\}x\{0,\}z/&/p
> >         YYxxz
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y\{2,\}\(xy\)\1
> >         YYxyxy
> >         + echo YYxyxy
> >         + sed -ne s/Y\{2,\}\(xy\)\1/&/p
> >         YYxyxy
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y\{0,\}\(xy\)\1
> >         + echo YYxyxy
> >         + sed -ne s/Y\{0,\}\(xy\)\1/&/p
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y*\(xy\)\1
> >         YYxyxy
> >         + echo YYxyxy
> >         + sed -ne s/Y*\(xy\)\1/&/p
> >         YYxyxy
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y\{0,\}\(xy\)xy
> >         YYxyxy
> >         + echo YYxyxy
> >         + sed -ne s/Y\{0,\}\(xy\)xy/&/p
> >         YYxyxy
> >         + set +x
> > > Fix:
> >         No known general work-around
> > 
> > 
> 
> Hi,
> 
> I can reproduce on current. Do you have an idea if NetBSD or FreeBSD
> suffer from te same?
> 
>       -Otto
> 

These are the tests incoorporated into our regress tests:

        -Otto

Index: tests
===================================================================
RCS file: /cvs/src/regress/lib/libc/regex/tests,v
retrieving revision 1.9
diff -u -p -r1.9 tests
--- tests       28 Dec 2020 21:41:55 -0000      1.9
+++ tests       2 Apr 2021 14:16:59 -0000
@@ -595,3 +595,18 @@ a?b        -       ab      ab
 # FreeBSD PR 130504
 (.|())(b)      -       ab      ab
 (()|.)(b)      -       ab      ab
+
+# Some BRE cases where \{0,\} makes a backref go wrong, as reported by Michael 
Paoli
+Y*\(x\)\1      b       YYxx    YYxx
+Y\{2,\}\(x\)\1 b       YYxx    YYxx
+# Fails currently
+#Y\{0,\}\(x\)\1        b       YYxx    YYxx
+Y\{0,\}\(x\)   b       YYxx    YYx
+Y\{2,\}x\{1,\} b       YYxx    YYxx
+Y\{2,\}x\{0,\}z        b       YYxxz   YYxxz
+Y\{0,\}x\{0,\}z        b       YYxxz   YYxxz
+Y\{2,\}\(xy\)\1        b       YYxyxy  YYxyxy
+# Fails currently
+#Y\{0,\}\(xy\)\1       b       YYxyxy  YYxyxy
+Y*\(xy\)\1     b       YYxyxy  YYxyxy
+Y\{0,\}\(xy\)xy        b       YYxyxy  YYxyxy

Reply via email to