Re: Difference in POSIX regular expression for bash's '=~' operator and POSIXLY_CORRECT grep -E
Andreas Schwab * Why? There are no subexpression in your regexps. My bad. I really should have looked twice on what the manual says on BASH_REMATCH and checked up on what subexpressions is supposed to result. I thought they were a synonym for multiple results returned by a global flag, thinking that bash's '=~' had a permanent global flag set (hence why i was comparing it with grep -E) which seems like something that isn't currently implemented. Chet Ramey * Bash doesn't provide its own implementation of EREs: it uses whatever libc supplies. I assume that's different from whatever `grep -E' uses. Greg Wooledge * If you want [a-z] to work like ASCII does, you'll need to use LC_CTYPE=C. If you want to match lowercase letters in your current locale, you should use [[:lower:]] instead. Thank you for that though. Very interesting to say the least to see the get an idea in differences on ERE's, and at the very least I'll make sure to set my LC_CTYPE to C from now on.. From: Chet Ramey Sent: Tuesday, May 20, 2025 9:59 PM To: FunnyMan Computer ; bug-bash@gnu.org Cc: chet.ra...@case.edu Subject: Re: Difference in POSIX regular expression for bash's '=~' operator and POSIXLY_CORRECT grep -E On 5/20/25 3:08 PM, FunnyMan Computer wrote: > Bash Version: 5.2 > Patch Level: 37 > Release Status: release > > Description: > Bash's '=~' extended POSIX regex seems to behave very different to the > way grep's -E flag seems to deal with regular expressions. Bash doesn't provide its own implementation of EREs: it uses whatever libc supplies. I assume that's different from whatever `grep -E' uses. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Difference in POSIX regular expression for bash's '=~' operator and POSIXLY_CORRECT grep -E
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -flto=auto -DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' -DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' -DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -DNON_INTERACTIVE_LOGIN_SHELLS -std=gnu17 uname output: Linux vbox-virtualbox 6.12.28-1-MANJARO #1 SMP PREEMPT_DYNAMIC Fri, 09 May 2025 10:53:27 + x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.2 Patch Level: 37 Release Status: release Bash options: autocd off assoc_expand_once off cdable_vars off cdspell off checkhash off checkjobs off checkwinsizeon cmdhist on compat31off compat32off compat40off compat41off compat42off compat43off compat44off complete_fullquote on direxpand off dirspelloff dotglob off execfailoff expand_aliases on extdebugoff extglob on extquoteon failgloboff force_fignore on globasciiranges on globskipdotson globstaroff gnu_errfmt off histappend on histreedit off histverify off hostcompleteoff huponexit off inherit_errexit off interactive_commentson lastpipeoff lithist off localvar_inheritoff localvar_unset off login_shell off mailwarnoff no_empty_cmd_completion off nocaseglob off nocasematch off noexpand_translationoff nullgloboff patsub_replacement on progcompon progcomp_alias off promptvars on restricted_shelloff shift_verbose off sourcepath on varredir_close off xpg_echooff (tried looking for any changes turning off extglob and extquote but to no difference) Description: Bash's '=~' extended POSIX regex seems to behave very different to the way grep's -E flag seems to deal with regular expressions. I failed multiple times on getting similar results to what I was expecting from using grep just using the [a-z] and [a-z]+ classes - expecting multiple results from $BASH_REMATCH but it's only picking up 1 character at most, while grep -E is able to pick up all the characters (which is weird, since the class [a-z]+$ gives completely similar results). So, I was wondering whether this was a bug or intended and I'm just misinterpreting how bash does regular expressions. I tried reading the bash manual on the '=~' operator, -> https://www.gnu.org/software/bash/manual/bash.html#index-_005b_005b, but as far as I know (and to the extent of my knowledge how regular expressions work), this seems like unintended behavior. Repeat-By: grep: `$ echo test-test | POSIXLY_CORRECT=1 grep -E [a-z]` `^test^-^test^` `$ echo test-tesst | POSIXLY_CORRECT=1 grep -E [a-z]+` `^test^-^tesst^` bash's '=~' and $BASH_REMATCH: ``` $ if [[ test-test =~ [a-z] ]]; then for i in "${!BASH_REMATCH[@]}"; do echo "$i: ${BASH_REMATCH[$i]}"; done fi ``` `0: t` ``` $ if [[ test-tesst =~ [a-z]+ ]]; then for i in "${!BASH_REMATCH[@]}"; do echo "$i: ${BASH_REMATCH[$i]}"; done;