On Sun, Jul 17, 2011, Matthias Kilian wrote: > Then those ports should be fixed. There seem to be more GNUisms in > (recent?) GNU grep that are picked up by projects, for example the > use of \s and \S in pxltoraster (currently a disabled part of > ghostscript, for which I've got a diff and waiting for some more > test results). > > I understand that \<...\> is quite convenient, but where's the line > between convenience and feature bloat?
ooo, maybe I can add \s too. :) I don't know that there's a good answer to give here. I even think a little about putting such things in the libc regcomp, but that seems somewhat riskier. Then again, to quote the re_format man page, "The syntax for word boundaries is incredibly ugly." I don't think we really want to emulate all of pcre necessarily, but that is what people think of when they here "you can enter a regular expression here" because all the extra \escapes are what's offered by pcre/perl/python/ruby/javascript/you name it. And they are mostly backwards compatible with extended REs. posix does say "The interpretation of an ordinary character preceded by a backslash ( '\' ) is undefined." for both BREs and EREs, so adding additional \escapes cannot cause trouble for a properly written regex. Fun fact about posix: It doesn't specify [[:<:]] or -w. So a 100% posix grep is incapable of matching word boundaries at all. I can hear the screaming now if somebody proposed being strictly conformant. Regular expressions are a serious shortcoming in posix. EREs don't even have backrefs, you have to use dinosaur syntax BREs. How silly is that?