Re: [PATCH] Update RE_SYNTAX_EMACS to include features used by GNU Emacs

Bernhard Voelker Wed, 09 Jul 2025 12:27:23 -0700

[+bug-findutils]

Hi Collin,


On 7/9/25 04:17, Collin Funk wrote:

Bernhard, I built findutils with GNULIB_SRCDIR set to my local
clone. This uses the latest Gnulib commit instead of the one specified
by the submodule.

This patch causes the following 'make check' fail in findutils:

     ./../doc/regexprops.texi /tmp/check-regexprops.wUz52k differ: char 1649, 
line 45
     ./../doc/regexprops.texi is out of date.
     Updated output is saved in regexprops.texi.new
     FAIL check-regexprops (exit status: 1)


thanks for reporting this ... I incidentally saw this last weekend as well,
but couldn't get around fixing it yet.

> But with this patch RE_SYNTAX_EMACS is changed. A diff of the generated
> documentation confirms this.
>
> What is the proper way to fix this? My thinking is to first update the
> findutils uses and copy the regexprops.texi.new to regexprops.texi,
> since the new value of RE_SYNTAX_EMACS is more correct based on this
> thread. This file will also have to be copied to Gnulib's
> doc/regexprops-generic.texi, if I understand correctly.

Correct.

1. Commits already pushed to findutils:

* [PATCH 1/3] maint: update gnulib to latest
  https://cgit.git.sv.gnu.org/cgit/findutils.git/commit/?id=c7f5ff1ed88

* [PATCH 2/3] regexprops: sort regex_map alphabetically
  https://cgit.git.sv.gnu.org/cgit/findutils.git/commit/?id=c9c2c511759

* [PATCH 3/3] doc: regenerate regexprops.texi
  https://cgit.git.sv.gnu.org/cgit/findutils.git/commit/?id=facc27e1804

2. Commit (to be pushed) to gnulib - see attachment.

Good to push?

Have a nice day,
Berny

From f3aaeaf5e2d1cbbbd8c90c4389e7204aa079fdcb Mon Sep 17 00:00:00 2001
From: Bernhard Voelker <m...@bernhard-voelker.de>
Date: Wed, 9 Jul 2025 21:06:12 +0200
Subject: [PATCH] regexprops-generic: update from regex.h

* doc/regexprops-generic.texi: Re-generate by running the 'regexprops'
binary from GNU findutils:
  ./regexprops "Regular Expressions" generic
At least the recent(ish) change (efd5c380ff) to regex.h aligning
gnulib with Emacs behavior had made this document out-of-date.
Reported by Collin Funk in
<https://lists.gnu.org/archive/html/bug-gnulib/2025-07/msg00037.html>.
Additionally, today's findutils commit c9c2c51175 fixed the sort order
of the Texinfo nodes.
---
 ChangeLog                   |  13 ++
 doc/regexprops-generic.texi | 228 ++++++++++++++++++++----------------
 2 files changed, 141 insertions(+), 100 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 7913e25423..f8d0053181 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@
+2025-07-09  Bernhard Voelker  <m...@bernhard-voelker.de>
+
+	regexprops-generic: update from regex.h
+	* doc/regexprops-generic.texi: Re-generate by running the 'regexprops'
+	binary from GNU findutils:
+	  ./regexprops "Regular Expressions" generic
+	At least the recent(ish) change (efd5c380ff) to regex.h aligning
+	gnulib with Emacs behavior had made this document out-of-date.
+	Reported by Collin Funk in
+	<https://lists.gnu.org/archive/html/bug-gnulib/2025-07/msg00037.html>.
+	Additionally, today's findutils commit c9c2c51175 fixed the sort order
+	of the Texinfo nodes.
+
 2025-07-08  Paul Eggert  <egg...@cs.ucla.edu>
 
 	float-h: work around GCC bug 120993
diff --git a/doc/regexprops-generic.texi b/doc/regexprops-generic.texi
index 6de54abda3..9da39526e1 100644
--- a/doc/regexprops-generic.texi
+++ b/doc/regexprops-generic.texi
@@ -1,18 +1,18 @@
-@c Copyright (C) 1994, 1996, 1998, 2000--2001, 2003--2007, 2009--2025 Free
-@c Software Foundation, Inc.
+@c Copyright (C) 1994--2025 Free Software Foundation, Inc.
 @c
 @c Permission is granted to copy, distribute and/or modify this document
 @c under the terms of the GNU Free Documentation License, Version 1.3 or
 @c any later version published by the Free Software Foundation; with no
-@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  A
-@c copy of the license is at <https://www.gnu.org/licenses/fdl-1.3.en.html>.
+@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the ``GNU Free
+@c Documentation License'' file as part of this distribution.
 
 @c this regular expression description is for: generic
 
 @menu
 * awk regular expression syntax::
-* egrep regular expression syntax::
 * ed regular expression syntax::
+* egrep regular expression syntax::
 * emacs regular expression syntax::
 * gnu-awk regular expression syntax::
 * grep regular expression syntax::
@@ -46,21 +46,24 @@ matches a @samp{?}.
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} can be used to quote the following character.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.
 
+
 Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit matches that digit.
 
 The alternation operator is @samp{|}.
 
 The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
 
+
 @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
 @enumerate
 
 @item At the beginning of a regular expression
 
-@item After an open-group, signified by
-@samp{(}
+@item After an open-group, signified by @samp{(}
+
 @item After the alternation operator @samp{|}
 
 @end enumerate
@@ -71,28 +74,28 @@ The characters @samp{^} and @samp{$} always represent the beginning and end of a
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
 
-@node egrep regular expression syntax
-@subsection @samp{egrep} regular expression syntax
+@node ed regular expression syntax
+@subsection @samp{ed} regular expression syntax
 
 
-The character @samp{.} matches any single character.
+The character @samp{.} matches any single character except the null character.
 
 
 @table @samp
 
-@item +
-indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
-@item ?
-indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
 @item \+
-matches a @samp{+}
+indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
 @item \?
-matches a @samp{?}.
+indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
+@item + and ?
+match themselves.
+
 @end table
 
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are supported:
 @enumerate
 
@@ -115,39 +118,77 @@ GNU extensions are supported:
 @end enumerate
 
 
-Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
+Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
 
-The alternation operator is @samp{|}.
+The alternation operator is @samp{\|}.
 
-The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
+The character @samp{^} only represents the beginning of a string when it appears:
+@enumerate
 
-The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.
+@item At the beginning of a regular expression
+
+@item After an open-group, signified by @samp{\(}
+
+
+@item After the alternation operator @samp{\|}
+
+@end enumerate
+
+
+The character @samp{$} only represents the end of a string when it appears:
+@enumerate
+
+@item At the end of a regular expression
+
+@item Before a close-group, signified by @samp{\)}
+
+@item Before the alternation operator @samp{\|}
+
+@end enumerate
+
+
+@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
+@enumerate
+
+@item At the beginning of a regular expression
+
+@item After an open-group, signified by @samp{\(}
+
+@item After the alternation operator @samp{\|}
+
+@end enumerate
+
+
+Intervals are specified by @samp{\@{} and @samp{\@}}.
+Invalid intervals such as @samp{a\@{1z} are not accepted.
 
-Intervals are specified by @samp{@{} and @samp{@}}.  Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
 
-@node ed regular expression syntax
-@subsection @samp{ed} regular expression syntax
+@node egrep regular expression syntax
+@subsection @samp{egrep} regular expression syntax
 
 
-The character @samp{.} matches any single character except the null character.
+The character @samp{.} matches any single character.
 
 
 @table @samp
 
-@item \+
+@item +
 indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
-@item \?
+@item ?
 indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
-@item + and ?
-match themselves.
+@item \+
+matches a @samp{+}
+@item \?
+matches a @samp{?}.
 @end table
 
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are supported:
 @enumerate
 
@@ -170,49 +211,18 @@ GNU extensions are supported:
 @end enumerate
 
 
-Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
-
-The alternation operator is @samp{\|}.
-
-The character @samp{^} only represents the beginning of a string when it appears:
-@enumerate
-
-@item
-At the beginning of a regular expression
-
-@item After an open-group, signified by
-@samp{\(}
-
-@item After the alternation operator @samp{\|}
-
-@end enumerate
-
-
-The character @samp{$} only represents the end of a string when it appears:
-@enumerate
-
-@item At the end of a regular expression
-
-@item Before a close-group, signified by
-@samp{\)}
-@item Before the alternation operator @samp{\|}
-
-@end enumerate
-
+Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
 
-@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
-@enumerate
+The alternation operator is @samp{|}.
 
-@item At the beginning of a regular expression
+The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
 
-@item After an open-group, signified by
-@samp{\(}
-@item After the alternation operator @samp{\|}
 
-@end enumerate
+The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.
 
 
-Intervals are specified by @samp{\@{} and @samp{\@}}.  Invalid intervals such as @samp{a\@{1z} are not accepted.
+Intervals are specified by @samp{@{} and @samp{@}}.
+Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
@@ -237,7 +247,8 @@ matches a @samp{?}.
 @end table
 
 
-Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored.  Within square brackets, @samp{\} is taken literally.  Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.
+Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
+
 
 GNU extensions are supported:
 @enumerate
@@ -268,11 +279,10 @@ The alternation operator is @samp{\|}.
 The character @samp{^} only represents the beginning of a string when it appears:
 @enumerate
 
-@item
-At the beginning of a regular expression
+@item At the beginning of a regular expression
+
+@item After an open-group, signified by @samp{\(}
 
-@item After an open-group, signified by
-@samp{\(}
 
 @item After the alternation operator @samp{\|}
 
@@ -284,8 +294,8 @@ The character @samp{$} only represents the end of a string when it appears:
 
 @item At the end of a regular expression
 
-@item Before a close-group, signified by
-@samp{\)}
+@item Before a close-group, signified by @samp{\)}
+
 @item Before the alternation operator @samp{\|}
 
 @end enumerate
@@ -296,13 +306,15 @@ The character @samp{$} only represents the end of a string when it appears:
 
 @item At the beginning of a regular expression
 
-@item After an open-group, signified by
-@samp{\(}
+@item After an open-group, signified by @samp{\(}
+
 @item After the alternation operator @samp{\|}
 
 @end enumerate
 
 
+Intervals are specified by @samp{\@{} and @samp{\@}}.
+Invalid intervals such as @samp{a\@{1z} are not accepted.
 
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
@@ -330,6 +342,7 @@ matches a @samp{?}.
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} can be used to quote the following character.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are supported:
 @enumerate
 
@@ -358,19 +371,21 @@ The alternation operator is @samp{|}.
 
 The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
 
+
 @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
 @enumerate
 
 @item At the beginning of a regular expression
 
-@item After an open-group, signified by
-@samp{(}
+@item After an open-group, signified by @samp{(}
+
 @item After the alternation operator @samp{|}
 
 @end enumerate
 
 
-Intervals are specified by @samp{@{} and @samp{@}}.  Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
+Intervals are specified by @samp{@{} and @samp{@}}.
+Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
@@ -390,11 +405,13 @@ indicates that the regular expression should match one or more occurrences of th
 indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
 @item + and ?
 match themselves.
+
 @end table
 
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are supported:
 @enumerate
 
@@ -424,11 +441,10 @@ The alternation operator is @samp{\|}.
 The character @samp{^} only represents the beginning of a string when it appears:
 @enumerate
 
-@item
-At the beginning of a regular expression
+@item At the beginning of a regular expression
+
+@item After an open-group, signified by @samp{\(}
 
-@item After an open-group, signified by
-@samp{\(}
 
 @item After a newline
 
@@ -442,8 +458,8 @@ The character @samp{$} only represents the end of a string when it appears:
 
 @item At the end of a regular expression
 
-@item Before a close-group, signified by
-@samp{\)}
+@item Before a close-group, signified by @samp{\)}
+
 @item Before a newline
 
 @item Before the alternation operator @samp{\|}
@@ -456,8 +472,8 @@ The character @samp{$} only represents the end of a string when it appears:
 
 @item At the beginning of a regular expression
 
-@item After an open-group, signified by
-@samp{\(}
+@item After an open-group, signified by @samp{\(}
+
 @item After a newline
 
 @item After the alternation operator @samp{\|}
@@ -465,7 +481,9 @@ The character @samp{$} only represents the end of a string when it appears:
 @end enumerate
 
 
-Intervals are specified by @samp{\@{} and @samp{\@}}.  Invalid intervals such as @samp{a\@{1z} are not accepted.
+Intervals are specified by @samp{\@{} and @samp{\@}}.
+Invalid intervals such as @samp{a\@{1z} are not accepted.
+
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
@@ -492,27 +510,31 @@ matches a @samp{?}.
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} can be used to quote the following character.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.
 
+
 Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
 
 The alternation operator is @samp{|}.
 
 The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
 
+
 @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
 @enumerate
 
 @item At the beginning of a regular expression
 
-@item After an open-group, signified by
-@samp{(}
+@item After an open-group, signified by @samp{(}
+
 @item After the alternation operator @samp{|}
 
 @end enumerate
 
 
-Intervals are specified by @samp{@{} and @samp{@}}.  Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
+Intervals are specified by @samp{@{} and @samp{@}}.
+Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
@@ -545,6 +567,7 @@ matches a @samp{?}.
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are supported:
 @enumerate
 
@@ -573,19 +596,22 @@ The alternation operator is @samp{|}.
 
 The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
 
+
 @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
 @enumerate
 
 @item At the beginning of a regular expression
 
-@item After an open-group, signified by
-@samp{(}
+@item After an open-group, signified by @samp{(}
+
 @item After the alternation operator @samp{|}
 
 @end enumerate
 
 
-Intervals are specified by @samp{@{} and @samp{@}}.  Invalid intervals such as @samp{a@{1z} are not accepted.
+Intervals are specified by @samp{@{} and @samp{@}}.
+Invalid intervals such as @samp{a@{1z} are not accepted.
+
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
@@ -600,6 +626,7 @@ The character @samp{.} matches any single character except the null character.
 
 Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
 
+
 GNU extensions are supported:
 @enumerate
 
@@ -629,11 +656,10 @@ Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{
 The character @samp{^} only represents the beginning of a string when it appears:
 @enumerate
 
-@item
-At the beginning of a regular expression
+@item At the beginning of a regular expression
+
+@item After an open-group, signified by @samp{\(}
 
-@item After an open-group, signified by
-@samp{\(}
 
 @end enumerate
 
@@ -643,18 +669,20 @@ The character @samp{$} only represents the end of a string when it appears:
 
 @item At the end of a regular expression
 
-@item Before a close-group, signified by
-@samp{\)}
+@item Before a close-group, signified by @samp{\)}
+
 @end enumerate
 
 
 
 
-Intervals are specified by @samp{\@{} and @samp{\@}}.  Invalid intervals such as @samp{a\@{1z} are not accepted.
+Intervals are specified by @samp{\@{} and @samp{\@}}.
+Invalid intervals such as @samp{a\@{1z} are not accepted.
+
 
 The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
 
 
 @node sed regular expression syntax
 @subsection @samp{sed} regular expression syntax
-This is a synonym for ed.
+This is a synonym for ed.
\ No newline at end of file
-- 
2.50.0

Re: [PATCH] Update RE_SYNTAX_EMACS to include features used by GNU Emacs

Reply via email to