Bruno Haible <br...@clisp.org> wrote: > Hi Arnold, > > > Dot matching newline isn't the issue here. > > > > It's ^ matching in the middle of a string. For my purposes, ^ should > > only match at the beginning of a *string* (as $ should only match at > > the end of a string). I haven't rechecked POSIX, but this is how awk > > has behaved since forever. > > Hmm. Regarding POSIX: I've read section 9.3.8 and 9.4.9 of [1], > the description of REG_NOTBOL, REG_NOTEOL in [2], and the description > of REG_NEWLINE in [3]. If I understand it correctly, within POSIX, > ".^" should not match a newline because > - if REG_NEWLINE is set, '^' matches after the newline but '.' does not > match the newline, > - if REG_NEWLINE is not set, '.' matches newline but '^' does not match > after the newline.
That makes sense. This is why I felt that, for gawk, ".^" is an invalid regexp. (Indeed, the original Unix awk rejects it as such.) REG_NEWLINE is not included in any of the RE_*_AWK definitions since I want exactly the behavior you describe: dot matches newline but ^ does not match after the newline. To me this feels very much like a bug. > However, GNU regex.h also has a flag RE_CONTEXT_INDEP_ANCHORS; I don't know > what effect it has. In this case it makes things worse, causing gawk to match ".^" literally. > > (And how I've documented things in the manual, also since forever.) > > If you want the behaviour of the GNU regex to be stable over time, you > should contribute unit tests to tests/test-regex.c. This is a separate issue. It almost sounds like you're saying "it's your fault there's a bug here, you didn't contribute unit tests". I hope that's not your intent; if it is then sorry, I don't buy it. In any case, I've supplied a regexp, input data, and in the gawk dist, a test harness, so that debugging can be done if one of the Gnulib maintainers will look into this particular issue. Thanks, Arnold