On Fri, Apr 02, 2021 at 01:57:07PM +0200, Otto Moerbeek wrote:
> On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:
>
> > > Synopsis: Basic Regular Expression (BRE) bug in \{m,n\} with \(\)
> > > and \n
> > > Category: library
> > > Environment:
> > System : OpenBSD 6.7
> > Details : OpenBSD 6.7 (GENERIC) #7: Wed Jan 6 15:19:25 MST 2021
> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
> >
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > > Description:
> > Certain BRE expressions fail/misbehave unexpectedly.
> > The failures are the same in both grep and sed (without -E).
> > The failures only occur with certain combinations of use of:
> > \{\}, \(\), \n (where n is digit) syntax, dropping any one
> > of those then generally fails to trigger the bug.
> > The bug/error can be seen most clearly in unexpected
> > behavior of the \{m,n\} portion in the given context.
> > If more of the (apparently dependent) context is removed,
> > the bug doesn't show up. E.g. some of the clearest cases
> > involve replacing * with \{0,\} in the BRE, and getting
> > quite unexpected results (one would expect the results
> > to be the same). These same BREs work under both
> > Solaris 11 and GNU/Linux with their sed and grep.
> > > How-To-Repeat:
> > This example code can be used to illustrate the problem,
> > and both show cases where the bug shows up, and also slightly
> > differing contexts where the bug does not occur.
> > In each of these cases, the output should be the STRING
> > we set/echo into grep/sed where we use our BRE, but in the bug
> > cases we get no output.
> > It's also suggested test cases be added to the code to catch
> > possible regression bugs, should issue recur. :-)
> > Example code to show where bug does (and doesn't) show up:
> > (
> > exec 2>&1
> > set -- \
> > 'YYxx' 'Y*\(x\)\1' \
> > 'YYxx' 'Y\{0,\}\(x\)\1' \
> > 'YYxx' 'Y\{2,\}\(x\)\1' \
> > 'YYxx' 'Y\{0,\}\(x\)' \
> > 'YYxx' 'Y\{2,\}x' \
> > 'YYxx' 'Y\{2,\}x\{1,\}' \
> > 'YYxx' 'Y\{2,\}x\{0,\}' \
> > 'YYxxz' 'Y\{2,\}x\{0,\}z' \
> > 'YYxxz' 'Y\{0,\}x\{0,\}z' \
> > 'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> > 'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> > 'YYxyxy' 'Y*\(xy\)\1' \
> > 'YYxyxy' 'Y\{0,\}\(xy\)xy'
> > while [ "$#" -ge 2 ]
> > do
> > STRING="$1"; shift; BRE="$1"; shift
> > set -x
> > echo "$STRING" | grep -e "$BRE"
> > echo "$STRING" | sed -ne "s/$BRE/&/p"
> > set +x
> > done
> > )
> > Example run of above code. Bug is present where our
> > STRING echoed into grep/sed fails to appear in the
> > output:
> > + echo YYxx
> > + grep -e Y*\(x\)\1
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y*\(x\)\1/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep -e Y\{0,\}\(x\)\1
> > + echo YYxx
> > + sed -ne s/Y\{0,\}\(x\)\1/&/p
> > + set +x
> > + echo YYxx
> > + grep -e Y\{2,\}\(x\)\1
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y\{2,\}\(x\)\1/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep -e Y\{0,\}\(x\)
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y\{0,\}\(x\)/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep -e Y\{2,\}x
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y\{2,\}x/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep -e Y\{2,\}x\{1,\}
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y\{2,\}x\{1,\}/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep -e Y\{2,\}x\{0,\}
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y\{2,\}x\{0,\}/&/p
> > YYxx
> > + set +x
> > + echo YYxxz
> > + grep -e Y\{2,\}x\{0,\}z
> > YYxxz
> > + echo YYxxz
> > + sed -ne s/Y\{2,\}x\{0,\}z/&/p
> > YYxxz
> > + set +x
> > + echo YYxxz
> > + grep -e Y\{0,\}x\{0,\}z
> > YYxxz
> > + echo YYxxz
> > + sed -ne s/Y\{0,\}x\{0,\}z/&/p
> > YYxxz
> > + set +x
> > + echo YYxyxy
> > + grep -e Y\{2,\}\(xy\)\1
> > YYxyxy
> > + echo YYxyxy
> > + sed -ne s/Y\{2,\}\(xy\)\1/&/p
> > YYxyxy
> > + set +x
> > + echo YYxyxy
> > + grep -e Y\{0,\}\(xy\)\1
> > + echo YYxyxy
> > + sed -ne s/Y\{0,\}\(xy\)\1/&/p
> > + set +x
> > + echo YYxyxy
> > + grep -e Y*\(xy\)\1
> > YYxyxy
> > + echo YYxyxy
> > + sed -ne s/Y*\(xy\)\1/&/p
> > YYxyxy
> > + set +x
> > + echo YYxyxy
> > + grep -e Y\{0,\}\(xy\)xy
> > YYxyxy
> > + echo YYxyxy
> > + sed -ne s/Y\{0,\}\(xy\)xy/&/p
> > YYxyxy
> > + set +x
> > > Fix:
> > No known general work-around
> >
> >
>
> Hi,
>
> I can reproduce on current. Do you have an idea if NetBSD or FreeBSD
> suffer from te same?
>
> -Otto
>
These are the tests incoorporated into our regress tests:
-Otto
Index: tests
===================================================================
RCS file: /cvs/src/regress/lib/libc/regex/tests,v
retrieving revision 1.9
diff -u -p -r1.9 tests
--- tests 28 Dec 2020 21:41:55 -0000 1.9
+++ tests 2 Apr 2021 14:16:59 -0000
@@ -595,3 +595,18 @@ a?b - ab ab
# FreeBSD PR 130504
(.|())(b) - ab ab
(()|.)(b) - ab ab
+
+# Some BRE cases where \{0,\} makes a backref go wrong, as reported by Michael
Paoli
+Y*\(x\)\1 b YYxx YYxx
+Y\{2,\}\(x\)\1 b YYxx YYxx
+# Fails currently
+#Y\{0,\}\(x\)\1 b YYxx YYxx
+Y\{0,\}\(x\) b YYxx YYx
+Y\{2,\}x\{1,\} b YYxx YYxx
+Y\{2,\}x\{0,\}z b YYxxz YYxxz
+Y\{0,\}x\{0,\}z b YYxxz YYxxz
+Y\{2,\}\(xy\)\1 b YYxyxy YYxyxy
+# Fails currently
+#Y\{0,\}\(xy\)\1 b YYxyxy YYxyxy
+Y*\(xy\)\1 b YYxyxy YYxyxy
+Y\{0,\}\(xy\)xy b YYxyxy YYxyxy