On Wed, May 27, 2020 at 11:30 PM Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > On Tue, 26 May 2020 21:14:12 -0700 > "anton.paras" <an...@paras.nu> wrote: > > > I posted to Stack Exchange, and they recommended that I file a bug. I'd > > rather not copy+paste it all, so here's the link: > > > > > > > > https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and > > > > > > > > here's an example > > > > > > > > > echo 'dog and foo and bar and baz land good' |??? sed -E > > > 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/' > > > > > > > > expected output:?dog XYZ foo and bar and baz land good > > > > actual output:?dog and foo XYZ bar and baz land good > > > > > > here's my sed --version output:?sed (GNU sed) 4.2.2 > > > > > > > > I hope this is helpful, cheers! > > $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p' > foo and bar land > $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p' > $ > > It seems that there is the bug in regex. > > expected: > $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p' > $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p' > $ > > It also reproduces in grep. > > $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '(.*\band){2}' > foo and bar land > $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '.*\band.*\band' > $
I agree that this looks like a regex bug. This should print nothing: echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p' just as this already does: echo 'foo and bar land' | env LC_ALL=C sed -nE '/(.*\band){2}/p' Does anyone know if there's a glibc bug number for it?