Updated the comment. Emacs revision in which RE_CHAR_CLASSES is enabled is d24873d4, so syntax 0 must have been used for all releases before that

On 4/11/25 22:04, Eric Blake wrote:
On Fri, Apr 11, 2025 at 04:52:59PM +0300, Vladimir Gorsunov wrote:
   When GNU Emacs switched to using gnulib for regular expression
   functionality in the etags program, some features stopped working
   (please see https://debbugs.gnu.org/cgi/bugreport.cgi?bug=76945 for
   details). That is because RE_SYNTAX_EMACS flag combo in gnulib doesn't
   have the corresponding flags set. This value should be updated to
   fix etags and to better reflect the set of features GNU Emacs is
   using at the moment
 From 76f937ae2eacb3649117e7f4c05819e82a7c42a9 Mon Sep 17 00:00:00 2001
From: vg <v...@glums.kodeks.ru>
Date: Fri, 11 Apr 2025 16:28:29 +0300
Subject: [PATCH] Update RE_SYNTAX_EMACS to include features used by GNU Emacs

* lib/regex.h: macro update
* doc/regex.texi: documentation update
---
  doc/regex.texi | 3 ++-
  lib/regex.h    | 3 ++-
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/regex.texi b/doc/regex.texi
index cba1e13520..9917a418be 100644
--- a/doc/regex.texi
+++ b/doc/regex.texi
@@ -316,7 +316,8 @@ regular expressions.
  The predefined syntaxes---taken directly from @file{regex.h}---are:
@smallexample
-#define RE_SYNTAX_EMACS 0
+# define RE_SYNTAX_EMACS                                               \
+  (RE_CHAR_CLASSES | RE_INTERVALS)
Hmm.  GNU m4 1.4.19 documents that its regex engine matches emacs -
but that's only because m4 uses syntax 0.  If this change is made in
gnulib, then either th m4 manual needs to patched to state that it is
similar to emacs except for lacking character classes and intervals,
or we make a non-backwards-compatible change in m4 by actually using
RE_SYNTAX_EMACS instead of 0 for the default syntax.

Since there's already another long thread on how m4 does not match
current emacs regex but why enabling intervals would break at least
autoconf 2.72, I'm inclined to update the m4 manual rather than use
RE_SYNTAX_EMACS, whether or not this patch is accepted.

What's more, this patch is incomplete; if you change RE_SYNTAX_EMACS,
then you also need to change this paragraph:

/* The following bits are used to determine the regexp syntax we
    recognize.  The set/not-set meanings are chosen so that Emacs syntax
    remains the value 0.  The bits are given in alphabetical order, and
    the definitions shifted by one from the previous bit; thus, when we
    add or remove a bit, only one other definition need change.  */
From 0b7b548c2a547ab84adb0001e7d0629b5b6cb6f8 Mon Sep 17 00:00:00 2001
From: Vladimir Gorsunov <gorsu...@gmail.com>
Date: Sun, 13 Apr 2025 12:18:33 +0300
Subject: [PATCH] Update RE_SYNTAX_EMACS to include features used by GNU Emacs

* lib/regex.h: macro update
* doc/regex.texi: documentation update
---
 doc/regex.texi |  3 ++-
 lib/regex.h    | 12 +++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/doc/regex.texi b/doc/regex.texi
index cba1e13520..9917a418be 100644
--- a/doc/regex.texi
+++ b/doc/regex.texi
@@ -316,7 +316,8 @@ regular expressions.
 The predefined syntaxes---taken directly from @file{regex.h}---are:
 
 @smallexample
-#define RE_SYNTAX_EMACS 0
+# define RE_SYNTAX_EMACS						\
+  (RE_CHAR_CLASSES | RE_INTERVALS)
 
 #define RE_SYNTAX_AWK                                                   \
   (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL                       \
diff --git a/lib/regex.h b/lib/regex.h
index 67a3aa70a5..316a8e48fd 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -65,10 +65,11 @@ typedef long int s_reg_t;
 typedef unsigned long int active_reg_t;
 
 /* The following bits are used to determine the regexp syntax we
-   recognize.  The set/not-set meanings are chosen so that Emacs syntax
-   remains the value 0.  The bits are given in alphabetical order, and
-   the definitions shifted by one from the previous bit; thus, when we
-   add or remove a bit, only one other definition need change.  */
+   recognize. The set/not-set meanings are chosen so that the value 0
+   is the syntax used originally by Emacs (pre 21.1, when features
+   started to get added). The bits are given in alphabetical order, and
+   the definitions shifted by one from the previous bit; thus, when
+   we add or remove a bit, only one other definition need change. */
 typedef unsigned long int reg_syntax_t;
 
 #ifdef __USE_GNU
@@ -215,7 +216,8 @@ extern reg_syntax_t re_syntax_options;
    (The [[[ comments delimit what gets put into the Texinfo file, so
    don't delete them!)  */
 /* [[[begin syntaxes]]] */
-# define RE_SYNTAX_EMACS 0
+# define RE_SYNTAX_EMACS						\
+  (RE_CHAR_CLASSES | RE_INTERVALS)
 
 # define RE_SYNTAX_AWK							\
   (RE_BACKSLASH_ESCAPE_IN_LISTS   | RE_DOT_NOT_NULL			\
-- 
2.31.1

Reply via email to