On Fri, Jan 17, 2014 at 08:43:46AM -0500, Chet Ramey wrote: > On 1/16/14 6:46 PM, Eduardo A. Bustamante López wrote: > > The DEL ($'\177') character does not behave like the other control > > characters when used with the regex operator inside the test keyword. > > This has to do with the expansion of $r and that $r includes a backslash. > When combined with the internal quoting bash does, and the fact that the > backslash is special to pattern matching, we end up with this problem. > I've only thought about it a little so far, but I don't know if there's a > quick or simple fix. This may have to wait until after bash-4.3 is > released. > > Chet I understand that the backslash preceding a character *could* make it to not match, though $'\177' is the *only* non-graphic character that has this behavior.
This should make it more clear: ubuntu@ubuntu:~$ for c in $'\001' $'\r' $'\177' $'\277' $'\377'; do > r="\\$c"; [[ $c =~ $r ]]; printf 'c=%q r=%q %d\n' "$c" "$r" "$?" > done c=$'\001' r=$'\\\001' 0 c=$'\r' r=$'\\\r' 0 c=$'\177' r=$'\\\177' 1 c=$'\277' r=$'\\\277' 0 c=$'\377' r=$'\\\377' 0 My issue is more regarding why $'\177' has a different behavior than the other characters, than if the preceding backslash should make it match or not. That is, I would expect either these two outputs: O1: c=$'\001' r=$'\\\001' 1 c=$'\r' r=$'\\\r' 1 c=$'\177' r=$'\\\177' 1 c=$'\277' r=$'\\\277' 1 c=$'\377' r=$'\\\377' 1 O2: c=$'\001' r=$'\\\001' 0 c=$'\r' r=$'\\\r' 0 c=$'\177' r=$'\\\177' 0 c=$'\277' r=$'\\\277' 0 c=$'\377' r=$'\\\377' 0 But the real output shows that the case for c=$'\177' is treated special: c=$'\001' r=$'\\\001' 0 c=$'\r' r=$'\\\r' 0 c=$'\177' r=$'\\\177' 1 <-- this one behaves differently. c=$'\277' r=$'\\\277' 0 c=$'\377' r=$'\\\377' 0 --- Now, regarding the issue of whether the backslash should be treated in a special way, or treated literally, the only thing I can contribute is the behavior of GNU sed, which handles non-graphic characters preceded by a backslash the same as the individual character: ubuntu@ubuntu:~$ cat sed mapfile -t chars < <( printf '\\x%x\n' {1..255} | while read -r c; do printf "$c"'\n'; done ); for sed in sed 'sed -r'; do printf -- '--- sed: %s ---\n' "$sed" for c in "${chars[@]}"; do printf '%q > %q\n' "$c" "$(printf %s\\n "$c" | $sed "s/\\$c//" 2>&1)" done | grep -v "''\$" done ubuntu@ubuntu:~$ bash sed --- sed: sed --- '' > sed:\ -e\ expression\ #1\,\ char\ 5:\ unterminated\ \`s\'\ command '' > sed:\ -e\ expression\ #1\,\ char\ 5:\ unterminated\ \`s\'\ command \' > \' \( > sed:\ -e\ expression\ #1\,\ char\ 6:\ Unmatched\ \(\ or\ \\\( \) > sed:\ -e\ expression\ #1\,\ char\ 6:\ Unmatched\ \)\ or\ \\\) 1 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 2 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 3 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 4 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 5 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 6 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 7 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 8 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 9 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference \< > \< \> > \> B > B W > W \` > \` a > a b > b c > sed:\ -e\ expression\ #1\,\ char\ 6:\ Trailing\ backslash f > f n > n r > r s > s t > t v > v \{ > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ preceding\ regular\ expression \| > \| --- sed: sed -r --- '' > sed:\ -e\ expression\ #1\,\ char\ 5:\ unterminated\ \`s\'\ command '' > sed:\ -e\ expression\ #1\,\ char\ 5:\ unterminated\ \`s\'\ command \' > \' 1 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 2 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 3 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 4 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 5 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 6 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 7 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 8 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference 9 > sed:\ -e\ expression\ #1\,\ char\ 6:\ Invalid\ back\ reference \< > \< \> > \> B > B W > W \` > \` a > a b > b As you can see, all non-graphic characters are treated the same as the non-graphic character preceded by a backslash. I do not know how other regex engines treat this case. --- In case you're interested on why I care about this issue ($'\177'), see the special case I had to make in the ''requote2'' function for it to work in this case: https://github.com/lhunath/scripts/issues/3#issuecomment-32551132 --- Regarding the Cygwin issue: $ for c in $'\177' $'\200' $'\277' $'\376' $'\377'; do > r=$c; [[ $c =~ $r ]]; printf 'c=%q r=%q %d\n' "$c" "$r" "$?"; > done; $ echo "$BASH_VERSION $OS" c=$'\177' r=$'\177' 0 c=$'\200' r=$'\200' 2 c=$'\277' r=$'\277' 2 c=$'\376' r=$'\376' 2 c=$'\377' r=$'\377' 2 4.1.10(4)-release Windows_NT Notice how even when trying [[ $x =~ $x ]], it fails, and with the 2 status code. --- So, in short, there are three issues here: 1) Why is $'\177' handled differently (just that non-graphic character, in comparison to the other non-graphic)? 2) What's the reason of the incompatible behavior between bash in ubuntu vs bash in cygwin (i.e. the [[ keyword returning 2 for characters outside the ASCII range when trying to match them with =~) 3) How should bash treat the case of a character preceded by a backslash in regular expressions (and globs, as Dan reported in a previous issue)? I personally care more about 1 & 2, because these two prevent me from writing a function that works in both linux & cygwin, and at the same time, the special case for $'\177' makes me feel dirty. However bash handles 3, as long as it's consistent, I can deal with. -- Eduardo Alan Bustamante López