Re: Difference in POSIX regular expression for bash's '=~' operator and POSIXLY_CORRECT grep -E

2025-05-20 Thread FunnyMan Computer
Andreas Schwab

  *
Why?  There are no subexpression in your regexps.

My bad. I really should have looked twice on what the manual says on 
BASH_REMATCH and checked up on what subexpressions is supposed to result.
I thought they were a synonym for multiple results returned by a global flag, 
thinking that bash's '=~' had a permanent global flag set (hence why i was 
comparing it with grep -E) which seems like something that isn't currently 
implemented.

Chet Ramey

  *
Bash doesn't provide its own implementation of EREs: it uses whatever libc
supplies. I assume that's different from whatever `grep -E' uses.

Greg Wooledge

  *
If you want [a-z] to work like ASCII does, you'll need to use LC_CTYPE=C.

If you want to match lowercase letters in your current locale, you
should use [[:lower:]] instead.

Thank you for that though. Very interesting to say the least to see the get an 
idea in differences on ERE's, and at the very least I'll make sure to set my  
LC_CTYPE to C from now on..


From: Chet Ramey 
Sent: Tuesday, May 20, 2025 9:59 PM
To: FunnyMan Computer ; bug-bash@gnu.org 

Cc: chet.ra...@case.edu 
Subject: Re: Difference in POSIX regular expression for bash's '=~' operator 
and POSIXLY_CORRECT grep -E

On 5/20/25 3:08 PM, FunnyMan Computer wrote:

> Bash Version: 5.2
> Patch Level: 37
> Release Status: release
>


> Description:
>   Bash's '=~' extended POSIX regex seems to behave very different to the 
> way grep's -E flag seems to deal with regular expressions.

Bash doesn't provide its own implementation of EREs: it uses whatever libc
supplies. I assume that's different from whatever `grep -E' uses.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


Difference in POSIX regular expression for bash's '=~' operator and POSIXLY_CORRECT grep -E

2025-05-20 Thread FunnyMan Computer
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -march=x86-64 -mtune=generic -O2 -pipe -fno-plt 
-fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security   
  -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer 
-mno-omit-leaf-frame-pointer -flto=auto 
-DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' 
-DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' 
-DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -DNON_INTERACTIVE_LOGIN_SHELLS 
-std=gnu17
uname output: Linux vbox-virtualbox 6.12.28-1-MANJARO #1 SMP PREEMPT_DYNAMIC 
Fri, 09 May 2025 10:53:27 + x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.2
Patch Level: 37
Release Status: release

Bash options:   autocd  off
assoc_expand_once   off
cdable_vars off
cdspell off
checkhash   off
checkjobs   off
checkwinsizeon
cmdhist on
compat31off
compat32off
compat40off
compat41off
compat42off
compat43off
compat44off
complete_fullquote  on
direxpand   off
dirspelloff
dotglob off
execfailoff
expand_aliases  on
extdebugoff
extglob on
extquoteon
failgloboff
force_fignore   on
globasciiranges on
globskipdotson
globstaroff
gnu_errfmt  off
histappend  on
histreedit  off
histverify  off
hostcompleteoff
huponexit   off
inherit_errexit off
interactive_commentson
lastpipeoff
lithist off
localvar_inheritoff
localvar_unset  off
login_shell off
mailwarnoff
no_empty_cmd_completion   off
nocaseglob  off
nocasematch off
noexpand_translationoff
nullgloboff
patsub_replacement  on
progcompon
progcomp_alias  off
promptvars  on
restricted_shelloff
shift_verbose   off
sourcepath  on
varredir_close  off
xpg_echooff

(tried looking for any changes turning off extglob and extquote 
but to no difference)
Description:
  Bash's '=~' extended POSIX regex seems to behave very different to the 
way grep's -E flag seems to deal with regular expressions.
I failed multiple times on getting similar results to what I was 
expecting from using grep just using the [a-z] and [a-z]+ classes - expecting 
multiple results from $BASH_REMATCH but it's only picking up 1 character at 
most, while grep -E is able to pick up all the characters (which is weird, 
since the class [a-z]+$ gives completely similar results).
So, I was wondering whether this was a bug or intended and I'm just 
misinterpreting how bash does regular expressions. I tried reading the bash 
manual on the '=~' operator,
-> https://www.gnu.org/software/bash/manual/bash.html#index-_005b_005b,
but as far as I know (and to the extent of my knowledge how regular expressions 
work), this seems like unintended behavior.
Repeat-By:
  grep:
`$ echo test-test | POSIXLY_CORRECT=1 grep -E [a-z]`
`^test^-^test^`

`$ echo test-tesst | POSIXLY_CORRECT=1 grep -E [a-z]+`
`^test^-^tesst^`

bash's '=~' and $BASH_REMATCH:
```
$ if [[ test-test =~ [a-z] ]]; then
for i in "${!BASH_REMATCH[@]}"; do
echo "$i: ${BASH_REMATCH[$i]}";
done
fi
```
`0: t`
```
$ if [[ test-tesst =~ [a-z]+ ]]; then
for i in "${!BASH_REMATCH[@]}"; do
echo "$i: ${BASH_REMATCH[$i]}";
done;