On Sun, Jul 17, 2011, Matthias Kilian wrote:
 
> Then those ports should be fixed. There seem to be more GNUisms in
> (recent?) GNU grep that are picked up by projects, for example the
> use of \s and \S in pxltoraster (currently a disabled part of
> ghostscript, for which I've got a diff and waiting for some more
> test results).
> 
> I understand that \<...\> is quite convenient, but where's the line
> between convenience and feature bloat?

ooo, maybe I can add \s too. :)

I don't know that there's a good answer to give here.  I even think a
little about putting such things in the libc regcomp, but that seems
somewhat riskier.  Then again, to quote the re_format man page, "The
syntax for word boundaries is incredibly ugly."

I don't think we really want to emulate all of pcre necessarily, but
that is what people think of when they here "you can enter a regular
expression here" because all the extra \escapes are what's offered by
pcre/perl/python/ruby/javascript/you name it.  And they are mostly
backwards compatible with extended REs.

posix does say "The interpretation of an ordinary character preceded by
a backslash ( '\' ) is undefined." for both BREs and EREs, so adding
additional \escapes cannot cause trouble for a properly written regex.

Fun fact about posix:  It doesn't specify [[:<:]] or -w.  So a 100%
posix grep is incapable of matching word boundaries at all.  I can hear
the screaming now if somebody proposed being strictly conformant.

Regular expressions are a serious shortcoming in posix.  EREs don't even
have backrefs, you have to use dinosaur syntax BREs.  How silly is that?

Reply via email to