> On Fri, Apr 11, 2025 at 04:52:59PM +0300, Vladimir Gorsunov wrote:
> >   When GNU Emacs switched to using gnulib for regular expression
> >   functionality in the etags program, some features stopped working
> >   (please see https://debbugs.gnu.org/cgi/bugreport.cgi?bug=76945 for
> >   details). That is because RE_SYNTAX_EMACS flag combo in gnulib doesn't
> >   have the corresponding flags set. This value should be updated to
> >   fix etags and to better reflect the set of features GNU Emacs is
> >   using at the moment
>
> > From 76f937ae2eacb3649117e7f4c05819e82a7c42a9 Mon Sep 17 00:00:00 2001
> > From: vg <v...@glums.kodeks.ru>
> > Date: Fri, 11 Apr 2025 16:28:29 +0300
> > Subject: [PATCH] Update RE_SYNTAX_EMACS to include features used by GNU 
> > Emacs
> >
> > * lib/regex.h: macro update
> > * doc/regex.texi: documentation update
> > ---
> >  doc/regex.texi | 3 ++-
> >  lib/regex.h    | 3 ++-
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/doc/regex.texi b/doc/regex.texi
> > index cba1e13520..9917a418be 100644
> > --- a/doc/regex.texi
> > +++ b/doc/regex.texi
> > @@ -316,7 +316,8 @@ regular expressions.
> >  The predefined syntaxes---taken directly from @file{regex.h}---are:
> >
> >  @smallexample
> > -#define RE_SYNTAX_EMACS 0
> > +# define RE_SYNTAX_EMACS                                             \
> > +  (RE_CHAR_CLASSES | RE_INTERVALS)
>
> Hmm.  GNU m4 1.4.19 documents that its regex engine matches emacs -
> but that's only because m4 uses syntax 0.  If this change is made in
> gnulib, then either th m4 manual needs to patched to state that it is
> similar to emacs except for lacking character classes and intervals,
> or we make a non-backwards-compatible change in m4 by actually using
> RE_SYNTAX_EMACS instead of 0 for the default syntax.
>
> Since there's already another long thread on how m4 does not match
> current emacs regex but why enabling intervals would break at least
> autoconf 2.72, I'm inclined to update the m4 manual rather than use
> RE_SYNTAX_EMACS, whether or not this patch is accepted.

I'm having a bit of issue following, but this is relevant to me, so
I'd like to ask the following questions:

1) <regex.h> has two interfaces, the old glibc one that gnulib
implements and the POSIX one with regcomp() and regexec(). What I've
noticed is inconsistency between the two interfaces in syntax:

    # m4 regexp that matches:
    regexp(`foo', `[a-z]+')

This will not match with POSIX:

    regcomp(&re, "[a-z]+", 0);
    assert(regexec(&re, "foo", 0, NULL, 0) == REG_NOMATCH);

The reason is that POSIX BRE wants [a-z]\+ instead. So the question
is, does this mean the two interfaces have incompatible syntaxes? I
don't think that's clarified in either the glibc manual
<https://www.gnu.org/software/libc/manual/html_node/Regular-Expressions.html>
and gnulib's
<https://www.gnu.org/software/gnulib/manual/html_node/The-Backslash-Character.html>.
Perhaps
gnulib should be agnostic of this issue (although worth a mention?)
but certainly glibc should mention it.

2) Is there going to be a change planned in either gnulib, glibc, or
m4 in terms of the regex syntax? If m4 breaks backwards compatibility,
how will all the m4 scripts be fixed? Isn't it nontrivial?

3) What syntax does m4 follow after all? Should it be called the Emacs
syntax or will that passage be changed from the manual?

Regards,
Nikolaos Chatzikonstantinou

Reply via email to