glibc already has support for \` and \' as absolute input boundaries, while ^ and $ are a bit more flexible (depending on RE_CONTEXT_INDEP_ANCHORS). Worse, ^ and $ are not portable across all regex flavors - there are languages where they match at any newline within the larger input, and where blindly copying a POSIX BRE or ERE regex to or from these other languages matches different inputs, representing a security risk if those regex were trying to do data validation and overlook data that intentionally abuses a newline in the middle to work around a regex that is not anchored to a full match. So POSIX is seriously considering a proposal to add new escapes that will be portable across more languages to force matches to align to beginning or end of absolute input regardless of whether ^ and $ can match at newlines embedded within the input.
However, most other languages spell it \A and \z (or sometimes \Z) rather than \` and \'. The easiest way for POSIX to specify something that is portable across languages is to pick an escape that most languages support and which also has existisng implementation practice in C. Therefore, a first step is letting GNU regex parse \A and \z identically to \` and \'. See also: https://www.austingroupbugs.net/view.php?id=1919 https://best.openssf.org/Correctly-Using-Regular-Expressions --- I'm also open to the idea of adding a new RE_* flag to opt-in to this spelling on a per-compilation basis for re_compile, or even a new REG_* for use with regcomp. --- posix/regcomp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/posix/regcomp.c b/posix/regcomp.c index 69675d81f7..848a2823a3 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -1885,6 +1885,7 @@ peek_token (re_token_t *token, re_string_t *input, reg_syntax_t syntax) token->type = OP_NOTSPACE; break; case '`': + case 'A': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; @@ -1892,6 +1893,7 @@ peek_token (re_token_t *token, re_string_t *input, reg_syntax_t syntax) } break; case '\'': + case 'z': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; base-commit: e78caeb4ff812ae19d24d65f4d4d48508154277b -- 2.49.0