Synopsis: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n
Category: library
Environment:
System : OpenBSD 6.7
Details : OpenBSD 6.7 (GENERIC) #7: Wed Jan 6 15:19:25 MST 2021
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
Architecture: OpenBSD.amd64
Machine : amd64
Description:
Certain BRE expressions fail/misbehave unexpectedly.
The failures are the same in both grep and sed (without -E).
The failures only occur with certain combinations of use of:
\{\}, \(\), \n (where n is digit) syntax, dropping any one
of those then generally fails to trigger the bug.
The bug/error can be seen most clearly in unexpected
behavior of the \{m,n\} portion in the given context.
If more of the (apparently dependent) context is removed,
the bug doesn't show up. E.g. some of the clearest cases
involve replacing * with \{0,\} in the BRE, and getting
quite unexpected results (one would expect the results
to be the same). These same BREs work under both
Solaris 11 and GNU/Linux with their sed and grep.
How-To-Repeat:
This example code can be used to illustrate the problem,
and both show cases where the bug shows up, and also slightly
differing contexts where the bug does not occur.
In each of these cases, the output should be the STRING
we set/echo into grep/sed where we use our BRE, but in the bug
cases we get no output.
It's also suggested test cases be added to the code to catch
possible regression bugs, should issue recur. :-)
Example code to show where bug does (and doesn't) show up:
(
exec 2>&1
set -- \
'YYxx' 'Y*\(x\)\1' \
'YYxx' 'Y\{0,\}\(x\)\1' \
'YYxx' 'Y\{2,\}\(x\)\1' \
'YYxx' 'Y\{0,\}\(x\)' \
'YYxx' 'Y\{2,\}x' \
'YYxx' 'Y\{2,\}x\{1,\}' \
'YYxx' 'Y\{2,\}x\{0,\}' \
'YYxxz' 'Y\{2,\}x\{0,\}z' \
'YYxxz' 'Y\{0,\}x\{0,\}z' \
'YYxyxy' 'Y\{2,\}\(xy\)\1' \
'YYxyxy' 'Y\{0,\}\(xy\)\1' \
'YYxyxy' 'Y*\(xy\)\1' \
'YYxyxy' 'Y\{0,\}\(xy\)xy'
while [ "$#" -ge 2 ]
do
STRING="$1"; shift; BRE="$1"; shift
set -x
echo "$STRING" | grep -e "$BRE"
echo "$STRING" | sed -ne "s/$BRE/&/p"
set +x
done
)
Example run of above code. Bug is present where our
STRING echoed into grep/sed fails to appear in the
output:
+ echo YYxx
+ grep -e Y*\(x\)\1
YYxx
+ echo YYxx
+ sed -ne s/Y*\(x\)\1/&/p
YYxx
+ set +x
+ echo YYxx
+ grep -e Y\{0,\}\(x\)\1
+ echo YYxx
+ sed -ne s/Y\{0,\}\(x\)\1/&/p
+ set +x
+ echo YYxx
+ grep -e Y\{2,\}\(x\)\1
YYxx
+ echo YYxx
+ sed -ne s/Y\{2,\}\(x\)\1/&/p
YYxx
+ set +x
+ echo YYxx
+ grep -e Y\{0,\}\(x\)
YYxx
+ echo YYxx
+ sed -ne s/Y\{0,\}\(x\)/&/p
YYxx
+ set +x
+ echo YYxx
+ grep -e Y\{2,\}x
YYxx
+ echo YYxx
+ sed -ne s/Y\{2,\}x/&/p
YYxx
+ set +x
+ echo YYxx
+ grep -e Y\{2,\}x\{1,\}
YYxx
+ echo YYxx
+ sed -ne s/Y\{2,\}x\{1,\}/&/p
YYxx
+ set +x
+ echo YYxx
+ grep -e Y\{2,\}x\{0,\}
YYxx
+ echo YYxx
+ sed -ne s/Y\{2,\}x\{0,\}/&/p
YYxx
+ set +x
+ echo YYxxz
+ grep -e Y\{2,\}x\{0,\}z
YYxxz
+ echo YYxxz
+ sed -ne s/Y\{2,\}x\{0,\}z/&/p
YYxxz
+ set +x
+ echo YYxxz
+ grep -e Y\{0,\}x\{0,\}z
YYxxz
+ echo YYxxz
+ sed -ne s/Y\{0,\}x\{0,\}z/&/p
YYxxz
+ set +x
+ echo YYxyxy
+ grep -e Y\{2,\}\(xy\)\1
YYxyxy
+ echo YYxyxy
+ sed -ne s/Y\{2,\}\(xy\)\1/&/p
YYxyxy
+ set +x
+ echo YYxyxy
+ grep -e Y\{0,\}\(xy\)\1
+ echo YYxyxy
+ sed -ne s/Y\{0,\}\(xy\)\1/&/p
+ set +x
+ echo YYxyxy
+ grep -e Y*\(xy\)\1
YYxyxy
+ echo YYxyxy
+ sed -ne s/Y*\(xy\)\1/&/p
YYxyxy
+ set +x
+ echo YYxyxy
+ grep -e Y\{0,\}\(xy\)xy
YYxyxy
+ echo YYxyxy
+ sed -ne s/Y\{0,\}\(xy\)xy/&/p
YYxyxy
+ set +x
Fix:
No known general work-around