Synopsis:      Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n
Category:      library
Environment:
        System      : OpenBSD 6.7
        Details     : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC

        Architecture: OpenBSD.amd64
        Machine     : amd64
Description:
        Certain BRE expressions fail/misbehave unexpectedly.
        The failures are the same in both grep and sed (without -E).
        The failures only occur with certain combinations of use of:
        \{\}, \(\), \n (where n is digit) syntax, dropping any one
        of those then generally fails to trigger the bug.
        The bug/error can be seen most clearly in unexpected
        behavior of the \{m,n\} portion in the given context.
        If more of the (apparently dependent) context is removed,
        the bug doesn't show up.  E.g. some of the clearest cases
        involve replacing * with \{0,\} in the BRE, and getting
        quite unexpected results (one would expect the results
        to be the same).  These same BREs work under both
        Solaris 11 and GNU/Linux with their sed and grep.
How-To-Repeat:
        This example code can be used to illustrate the problem,
        and both show cases where the bug shows up, and also slightly
        differing contexts where the bug does not occur.
        In each of these cases, the output should be the STRING
        we set/echo into grep/sed where we use our BRE, but in the bug
        cases we get no output.
        It's also suggested test cases be added to the code to catch
        possible regression bugs, should issue recur.  :-)
        Example code to show where bug does (and doesn't) show up:
        (
                exec 2>&1
                set -- \
                        'YYxx' 'Y*\(x\)\1' \
                        'YYxx' 'Y\{0,\}\(x\)\1' \
                        'YYxx' 'Y\{2,\}\(x\)\1' \
                        'YYxx' 'Y\{0,\}\(x\)' \
                        'YYxx' 'Y\{2,\}x' \
                        'YYxx' 'Y\{2,\}x\{1,\}' \
                        'YYxx' 'Y\{2,\}x\{0,\}' \
                        'YYxxz' 'Y\{2,\}x\{0,\}z' \
                        'YYxxz' 'Y\{0,\}x\{0,\}z' \
                        'YYxyxy' 'Y\{2,\}\(xy\)\1' \
                        'YYxyxy' 'Y\{0,\}\(xy\)\1' \
                        'YYxyxy' 'Y*\(xy\)\1' \
                        'YYxyxy' 'Y\{0,\}\(xy\)xy'
                while [ "$#" -ge 2 ]
                do
                        STRING="$1"; shift; BRE="$1"; shift
                        set -x
                        echo "$STRING" | grep -e "$BRE"
                        echo "$STRING" | sed -ne "s/$BRE/&/p"
                        set +x
                done
        )
        Example run of above code.  Bug is present where our
        STRING echoed into grep/sed fails to appear in the
        output:
        + echo YYxx
        + grep -e Y*\(x\)\1
        YYxx
        + echo YYxx
        + sed -ne s/Y*\(x\)\1/&/p
        YYxx
        + set +x
        + echo YYxx
        + grep -e Y\{0,\}\(x\)\1
        + echo YYxx
        + sed -ne s/Y\{0,\}\(x\)\1/&/p
        + set +x
        + echo YYxx
        + grep -e Y\{2,\}\(x\)\1
        YYxx
        + echo YYxx
        + sed -ne s/Y\{2,\}\(x\)\1/&/p
        YYxx
        + set +x
        + echo YYxx
        + grep -e Y\{0,\}\(x\)
        YYxx
        + echo YYxx
        + sed -ne s/Y\{0,\}\(x\)/&/p
        YYxx
        + set +x
        + echo YYxx
        + grep -e Y\{2,\}x
        YYxx
        + echo YYxx
        + sed -ne s/Y\{2,\}x/&/p
        YYxx
        + set +x
        + echo YYxx
        + grep -e Y\{2,\}x\{1,\}
        YYxx
        + echo YYxx
        + sed -ne s/Y\{2,\}x\{1,\}/&/p
        YYxx
        + set +x
        + echo YYxx
        + grep -e Y\{2,\}x\{0,\}
        YYxx
        + echo YYxx
        + sed -ne s/Y\{2,\}x\{0,\}/&/p
        YYxx
        + set +x
        + echo YYxxz
        + grep -e Y\{2,\}x\{0,\}z
        YYxxz
        + echo YYxxz
        + sed -ne s/Y\{2,\}x\{0,\}z/&/p
        YYxxz
        + set +x
        + echo YYxxz
        + grep -e Y\{0,\}x\{0,\}z
        YYxxz
        + echo YYxxz
        + sed -ne s/Y\{0,\}x\{0,\}z/&/p
        YYxxz
        + set +x
        + echo YYxyxy
        + grep -e Y\{2,\}\(xy\)\1
        YYxyxy
        + echo YYxyxy
        + sed -ne s/Y\{2,\}\(xy\)\1/&/p
        YYxyxy
        + set +x
        + echo YYxyxy
        + grep -e Y\{0,\}\(xy\)\1
        + echo YYxyxy
        + sed -ne s/Y\{0,\}\(xy\)\1/&/p
        + set +x
        + echo YYxyxy
        + grep -e Y*\(xy\)\1
        YYxyxy
        + echo YYxyxy
        + sed -ne s/Y*\(xy\)\1/&/p
        YYxyxy
        + set +x
        + echo YYxyxy
        + grep -e Y\{0,\}\(xy\)xy
        YYxyxy
        + echo YYxyxy
        + sed -ne s/Y\{0,\}\(xy\)xy/&/p
        YYxyxy
        + set +x
Fix:
        No known general work-around

Reply via email to