[+bug-findutils] Hi Collin,
On 7/9/25 04:17, Collin Funk wrote:
Bernhard, I built findutils with GNULIB_SRCDIR set to my local clone. This uses the latest Gnulib commit instead of the one specified by the submodule. This patch causes the following 'make check' fail in findutils: ./../doc/regexprops.texi /tmp/check-regexprops.wUz52k differ: char 1649, line 45 ./../doc/regexprops.texi is out of date. Updated output is saved in regexprops.texi.new FAIL check-regexprops (exit status: 1)
thanks for reporting this ... I incidentally saw this last weekend as well, but couldn't get around fixing it yet. > But with this patch RE_SYNTAX_EMACS is changed. A diff of the generated > documentation confirms this. > > What is the proper way to fix this? My thinking is to first update the > findutils uses and copy the regexprops.texi.new to regexprops.texi, > since the new value of RE_SYNTAX_EMACS is more correct based on this > thread. This file will also have to be copied to Gnulib's > doc/regexprops-generic.texi, if I understand correctly. Correct. 1. Commits already pushed to findutils: * [PATCH 1/3] maint: update gnulib to latest https://cgit.git.sv.gnu.org/cgit/findutils.git/commit/?id=c7f5ff1ed88 * [PATCH 2/3] regexprops: sort regex_map alphabetically https://cgit.git.sv.gnu.org/cgit/findutils.git/commit/?id=c9c2c511759 * [PATCH 3/3] doc: regenerate regexprops.texi https://cgit.git.sv.gnu.org/cgit/findutils.git/commit/?id=facc27e1804 2. Commit (to be pushed) to gnulib - see attachment. Good to push? Have a nice day, Berny
From f3aaeaf5e2d1cbbbd8c90c4389e7204aa079fdcb Mon Sep 17 00:00:00 2001 From: Bernhard Voelker <m...@bernhard-voelker.de> Date: Wed, 9 Jul 2025 21:06:12 +0200 Subject: [PATCH] regexprops-generic: update from regex.h * doc/regexprops-generic.texi: Re-generate by running the 'regexprops' binary from GNU findutils: ./regexprops "Regular Expressions" generic At least the recent(ish) change (efd5c380ff) to regex.h aligning gnulib with Emacs behavior had made this document out-of-date. Reported by Collin Funk in <https://lists.gnu.org/archive/html/bug-gnulib/2025-07/msg00037.html>. Additionally, today's findutils commit c9c2c51175 fixed the sort order of the Texinfo nodes. --- ChangeLog | 13 ++ doc/regexprops-generic.texi | 228 ++++++++++++++++++++---------------- 2 files changed, 141 insertions(+), 100 deletions(-) diff --git a/ChangeLog b/ChangeLog index 7913e25423..f8d0053181 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,16 @@ +2025-07-09 Bernhard Voelker <m...@bernhard-voelker.de> + + regexprops-generic: update from regex.h + * doc/regexprops-generic.texi: Re-generate by running the 'regexprops' + binary from GNU findutils: + ./regexprops "Regular Expressions" generic + At least the recent(ish) change (efd5c380ff) to regex.h aligning + gnulib with Emacs behavior had made this document out-of-date. + Reported by Collin Funk in + <https://lists.gnu.org/archive/html/bug-gnulib/2025-07/msg00037.html>. + Additionally, today's findutils commit c9c2c51175 fixed the sort order + of the Texinfo nodes. + 2025-07-08 Paul Eggert <egg...@cs.ucla.edu> float-h: work around GCC bug 120993 diff --git a/doc/regexprops-generic.texi b/doc/regexprops-generic.texi index 6de54abda3..9da39526e1 100644 --- a/doc/regexprops-generic.texi +++ b/doc/regexprops-generic.texi @@ -1,18 +1,18 @@ -@c Copyright (C) 1994, 1996, 1998, 2000--2001, 2003--2007, 2009--2025 Free -@c Software Foundation, Inc. +@c Copyright (C) 1994--2025 Free Software Foundation, Inc. @c @c Permission is granted to copy, distribute and/or modify this document @c under the terms of the GNU Free Documentation License, Version 1.3 or @c any later version published by the Free Software Foundation; with no -@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A -@c copy of the license is at <https://www.gnu.org/licenses/fdl-1.3.en.html>. +@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the ``GNU Free +@c Documentation License'' file as part of this distribution. @c this regular expression description is for: generic @menu * awk regular expression syntax:: -* egrep regular expression syntax:: * ed regular expression syntax:: +* egrep regular expression syntax:: * emacs regular expression syntax:: * gnu-awk regular expression syntax:: * grep regular expression syntax:: @@ -46,21 +46,24 @@ matches a @samp{?}. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit matches that digit. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} +@item After an open-group, signified by @samp{(} + @item After the alternation operator @samp{|} @end enumerate @@ -71,28 +74,28 @@ The characters @samp{^} and @samp{$} always represent the beginning and end of a The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node egrep regular expression syntax -@subsection @samp{egrep} regular expression syntax +@node ed regular expression syntax +@subsection @samp{ed} regular expression syntax -The character @samp{.} matches any single character. +The character @samp{.} matches any single character except the null character. @table @samp -@item + -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item ? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ -matches a @samp{+} +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? -matches a @samp{?}. +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item + and ? +match themselves. + @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -115,39 +118,77 @@ GNU extensions are supported: @end enumerate -Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. -The alternation operator is @samp{|}. +The alternation operator is @samp{\|}. -The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate -The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} + + +@item After the alternation operator @samp{\|} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before a close-group, signified by @samp{\)} + +@item Before the alternation operator @samp{\|} + +@end enumerate + + +@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} + +@item After the alternation operator @samp{\|} + +@end enumerate + + +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node ed regular expression syntax -@subsection @samp{ed} regular expression syntax +@node egrep regular expression syntax +@subsection @samp{egrep} regular expression syntax -The character @samp{.} matches any single character except the null character. +The character @samp{.} matches any single character. @table @samp -@item \+ +@item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item \? +@item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. -@item + and ? -match themselves. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -170,49 +211,18 @@ GNU extensions are supported: @end enumerate -Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. - -The alternation operator is @samp{\|}. - -The character @samp{^} only represents the beginning of a string when it appears: -@enumerate - -@item -At the beginning of a regular expression - -@item After an open-group, signified by -@samp{\(} - -@item After the alternation operator @samp{\|} - -@end enumerate - - -The character @samp{$} only represents the end of a string when it appears: -@enumerate - -@item At the end of a regular expression - -@item Before a close-group, signified by -@samp{\)} -@item Before the alternation operator @samp{\|} - -@end enumerate - +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. -@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: -@enumerate +The alternation operator is @samp{|}. -@item At the beginning of a regular expression +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -@item After an open-group, signified by -@samp{\(} -@item After the alternation operator @samp{\|} -@end enumerate +The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. -Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -237,7 +247,8 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -268,11 +279,10 @@ The alternation operator is @samp{\|}. The character @samp{^} only represents the beginning of a string when it appears: @enumerate -@item -At the beginning of a regular expression +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} -@item After an open-group, signified by -@samp{\(} @item After the alternation operator @samp{\|} @@ -284,8 +294,8 @@ The character @samp{$} only represents the end of a string when it appears: @item At the end of a regular expression -@item Before a close-group, signified by -@samp{\)} +@item Before a close-group, signified by @samp{\)} + @item Before the alternation operator @samp{\|} @end enumerate @@ -296,13 +306,15 @@ The character @samp{$} only represents the end of a string when it appears: @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{\(} +@item After an open-group, signified by @samp{\(} + @item After the alternation operator @samp{\|} @end enumerate +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -330,6 +342,7 @@ matches a @samp{?}. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -358,19 +371,21 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} +@item After an open-group, signified by @samp{(} + @item After the alternation operator @samp{|} @end enumerate -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -390,11 +405,13 @@ indicates that the regular expression should match one or more occurrences of th indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item + and ? match themselves. + @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -424,11 +441,10 @@ The alternation operator is @samp{\|}. The character @samp{^} only represents the beginning of a string when it appears: @enumerate -@item -At the beginning of a regular expression +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} -@item After an open-group, signified by -@samp{\(} @item After a newline @@ -442,8 +458,8 @@ The character @samp{$} only represents the end of a string when it appears: @item At the end of a regular expression -@item Before a close-group, signified by -@samp{\)} +@item Before a close-group, signified by @samp{\)} + @item Before a newline @item Before the alternation operator @samp{\|} @@ -456,8 +472,8 @@ The character @samp{$} only represents the end of a string when it appears: @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{\(} +@item After an open-group, signified by @samp{\(} + @item After a newline @item After the alternation operator @samp{\|} @@ -465,7 +481,9 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate -Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. + The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -492,27 +510,31 @@ matches a @samp{?}. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} +@item After an open-group, signified by @samp{(} + @item After the alternation operator @samp{|} @end enumerate -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -545,6 +567,7 @@ matches a @samp{?}. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -573,19 +596,22 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} +@item After an open-group, signified by @samp{(} + @item After the alternation operator @samp{|} @end enumerate -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals such as @samp{a@{1z} are not accepted. +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals such as @samp{a@{1z} are not accepted. + The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -600,6 +626,7 @@ The character @samp{.} matches any single character except the null character. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -629,11 +656,10 @@ Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{ The character @samp{^} only represents the beginning of a string when it appears: @enumerate -@item -At the beginning of a regular expression +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} -@item After an open-group, signified by -@samp{\(} @end enumerate @@ -643,18 +669,20 @@ The character @samp{$} only represents the end of a string when it appears: @item At the end of a regular expression -@item Before a close-group, signified by -@samp{\)} +@item Before a close-group, signified by @samp{\)} + @end enumerate -Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. + The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node sed regular expression syntax @subsection @samp{sed} regular expression syntax -This is a synonym for ed. +This is a synonym for ed. \ No newline at end of file -- 2.50.0