On Wed, May 27, 2020 at 11:30 PM Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> On Tue, 26 May 2020 21:14:12 -0700
> "anton.paras" <an...@paras.nu> wrote:
>
> > I posted to Stack Exchange, and they recommended that I file a bug. I'd 
> > rather not copy+paste it all, so here's the link:
> >
> >
> >
> > https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and
> >
> >
> >
> > here's an example
> >
> >
> >
> > > echo 'dog and foo and bar and baz land good' |??? sed -E 
> > > 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'
> >
> >
> >
> > expected output:?dog XYZ foo and bar and baz land good
> >
> > actual output:?dog and foo XYZ bar and baz land good
> >
> >
> > here's my sed --version output:?sed (GNU sed) 4.2.2
> >
> >
> >
> > I hope this is helpful, cheers!
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It seems that there is the bug in regex.
>
> expected:
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It also reproduces in grep.
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '(.*\band){2}'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '.*\band.*\band'
> $

I agree that this looks like a regex bug. This should print nothing:
  echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
just as this already does:
  echo 'foo and bar land' | env LC_ALL=C sed -nE '/(.*\band){2}/p'

Does anyone know if there's a glibc bug number for it?

Reply via email to