On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:
> > Synopsis: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and
> > \n
> > Category: library
> > Environment:
> System : OpenBSD 6.7
> Details : OpenBSD 6.7 (GENERIC) #7: Wed Jan 6 15:19:25 MST 2021
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
>
> Architecture: OpenBSD.amd64
> Machine : amd64
> > Description:
> Certain BRE expressions fail/misbehave unexpectedly.
> The failures are the same in both grep and sed (without -E).
> The failures only occur with certain combinations of use of:
> \{\}, \(\), \n (where n is digit) syntax, dropping any one
> of those then generally fails to trigger the bug.
> The bug/error can be seen most clearly in unexpected
> behavior of the \{m,n\} portion in the given context.
> If more of the (apparently dependent) context is removed,
> the bug doesn't show up. E.g. some of the clearest cases
> involve replacing * with \{0,\} in the BRE, and getting
> quite unexpected results (one would expect the results
> to be the same). These same BREs work under both
> Solaris 11 and GNU/Linux with their sed and grep.
> > How-To-Repeat:
> This example code can be used to illustrate the problem,
> and both show cases where the bug shows up, and also slightly
> differing contexts where the bug does not occur.
> In each of these cases, the output should be the STRING
> we set/echo into grep/sed where we use our BRE, but in the bug
> cases we get no output.
> It's also suggested test cases be added to the code to catch
> possible regression bugs, should issue recur. :-)
> Example code to show where bug does (and doesn't) show up:
> (
> exec 2>&1
> set -- \
> 'YYxx' 'Y*\(x\)\1' \
> 'YYxx' 'Y\{0,\}\(x\)\1' \
> 'YYxx' 'Y\{2,\}\(x\)\1' \
> 'YYxx' 'Y\{0,\}\(x\)' \
> 'YYxx' 'Y\{2,\}x' \
> 'YYxx' 'Y\{2,\}x\{1,\}' \
> 'YYxx' 'Y\{2,\}x\{0,\}' \
> 'YYxxz' 'Y\{2,\}x\{0,\}z' \
> 'YYxxz' 'Y\{0,\}x\{0,\}z' \
> 'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> 'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> 'YYxyxy' 'Y*\(xy\)\1' \
> 'YYxyxy' 'Y\{0,\}\(xy\)xy'
> while [ "$#" -ge 2 ]
> do
> STRING="$1"; shift; BRE="$1"; shift
> set -x
> echo "$STRING" | grep -e "$BRE"
> echo "$STRING" | sed -ne "s/$BRE/&/p"
> set +x
> done
> )
> Example run of above code. Bug is present where our
> STRING echoed into grep/sed fails to appear in the
> output:
> + echo YYxx
> + grep -e Y*\(x\)\1
> YYxx
> + echo YYxx
> + sed -ne s/Y*\(x\)\1/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{0,\}\(x\)\1
> + echo YYxx
> + sed -ne s/Y\{0,\}\(x\)\1/&/p
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}\(x\)\1
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}\(x\)\1/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{0,\}\(x\)
> YYxx
> + echo YYxx
> + sed -ne s/Y\{0,\}\(x\)/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}x
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}x/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}x\{1,\}
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}x\{1,\}/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}x\{0,\}
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}x\{0,\}/&/p
> YYxx
> + set +x
> + echo YYxxz
> + grep -e Y\{2,\}x\{0,\}z
> YYxxz
> + echo YYxxz
> + sed -ne s/Y\{2,\}x\{0,\}z/&/p
> YYxxz
> + set +x
> + echo YYxxz
> + grep -e Y\{0,\}x\{0,\}z
> YYxxz
> + echo YYxxz
> + sed -ne s/Y\{0,\}x\{0,\}z/&/p
> YYxxz
> + set +x
> + echo YYxyxy
> + grep -e Y\{2,\}\(xy\)\1
> YYxyxy
> + echo YYxyxy
> + sed -ne s/Y\{2,\}\(xy\)\1/&/p
> YYxyxy
> + set +x
> + echo YYxyxy
> + grep -e Y\{0,\}\(xy\)\1
> + echo YYxyxy
> + sed -ne s/Y\{0,\}\(xy\)\1/&/p
> + set +x
> + echo YYxyxy
> + grep -e Y*\(xy\)\1
> YYxyxy
> + echo YYxyxy
> + sed -ne s/Y*\(xy\)\1/&/p
> YYxyxy
> + set +x
> + echo YYxyxy
> + grep -e Y\{0,\}\(xy\)xy
> YYxyxy
> + echo YYxyxy
> + sed -ne s/Y\{0,\}\(xy\)xy/&/p
> YYxyxy
> + set +x
> > Fix:
> No known general work-around
>
>
Hi,
I can reproduce on current. Do you have an idea if NetBSD or FreeBSD
suffer from te same?
-Otto