Yesterday's change to regex.m4 has the effect that now, gnulib's regex code
gets used even on glibc systems. As a consequence, the ASAN+UBSAN build
in gnulib's CI now fails:

FAIL: test-regex
../../gllib/regexec.c:188:36: runtime error: variable length array bound 
evaluates to non-positive value 0

What the clang UBSAN is complaining about is this definition of the
regexec function:

int
regexec (const regex_t *__restrict preg, const char *__restrict string,
         size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags)
{ ... }

According to ISO C23 § 6.7.6.2.(5) the value of nmatch must be > 0 here.
Quote:
  "If the size is an expression that is not an integer constant expression:
   if it occurs in a declaration at function prototype scope, it is treated
   as if it were replaced by *; otherwise, each time it is evaluated it
   shall have a value greater than zero."
(Here we're in a function definition, not a function prototype.)

But the comments in regexec.c:174..175 indicate that nmatch is allowed to
be 0, and apparently the test suite exercises this case.

So, we can't use the syntax
  size_t nmatch, regmatch_t pmatch[nmatch]
here — it is undefined behaviour.

I tried two patches, attached below. The second one has the advantage that
it leaves the declaration of regexec() intact, which is a plus for static
analyzers. But it introduces a new warning:

In file included from ../../gllib/regex.c:71:
../../gllib/regexec.c:192:29: warning: argument 'pmatch' of type 'regmatch_t[]' 
with mismatched bound [-Warray-parameter]
  192 |          size_t nmatch, regmatch_t pmatch[/* nmatch */], int eflags)
      |                                    ^
../../gllib/regex.h:687:18: note: previously declared as 'regmatch_t[restrict 
__nmatch]' here
  687 |                     regmatch_t __pmatch[_Restrict_arr_
      |                                ^

So, I'm committing the first one.

Bruno
From e9e73bdeab431f29bb263b757bc8558796e475f6 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 14 Apr 2025 16:00:13 +0200
Subject: [PATCH] regex: Fix undefined behaviour.

* lib/regex.h (_REGEX_NELTS): Define to empty; don't use ISO C99
variable-length arrays.
---
 ChangeLog   | 6 ++++++
 lib/regex.h | 8 ++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 4aa2a83c08..0b1d316a24 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2025-04-14  Bruno Haible  <br...@clisp.org>
+
+	regex: Fix undefined behaviour.
+	* lib/regex.h (_REGEX_NELTS): Define to empty; don't use ISO C99
+	variable-length arrays.
+
 2025-04-14  Bruno Haible  <br...@clisp.org>
 
 	select tests: Work around a Cygwin bug.
diff --git a/lib/regex.h b/lib/regex.h
index ff7e43b534..0eb72ce908 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -523,8 +523,12 @@ typedef struct
 /* Declarations for routines.  */
 
 #ifndef _REGEX_NELTS
-# if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
-	&& !defined __STDC_NO_VLA__)
+/* The macro _REGEX_NELTS denotes the number of elements in a variable-length
+   array passed to a function.
+   It was meant to make use of ISO C99 variable-length arrays, but this does
+   not work: ISO C23 ?? 6.7.6.2.(5) requires the number of elements to be > 0,
+   but the NMATCH argument to regexec() is allowed to be 0.  */
+# if 0
 #  define _REGEX_NELTS(n) n
 # else
 #  define _REGEX_NELTS(n)
-- 
2.43.0

From 48e8974874bd5fad45904aed9679ee25b5caefbe Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 14 Apr 2025 16:15:27 +0200
Subject: [PATCH] regex: Fix undefined behaviour.

* lib/regex.h (_REGEX_NELTS): Add comment.
* lib/regexec.c (regexec): Don't use ISO C variable-length array syntax
for the pmatch parameter.
---
 ChangeLog     | 7 +++++++
 lib/regex.h   | 2 ++
 lib/regexec.c | 6 +++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 4aa2a83c08..a835a069d6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2025-04-14  Bruno Haible  <br...@clisp.org>
+
+	regex: Fix undefined behaviour.
+	* lib/regex.h (_REGEX_NELTS): Add comment.
+	* lib/regexec.c (regexec): Don't use ISO C variable-length array syntax
+	for the pmatch parameter.
+
 2025-04-14  Bruno Haible  <br...@clisp.org>
 
 	select tests: Work around a Cygwin bug.
diff --git a/lib/regex.h b/lib/regex.h
index ff7e43b534..191bd26836 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -522,6 +522,8 @@ typedef struct
 
 /* Declarations for routines.  */
 
+/* The macro _REGEX_NELTS denotes the number of elements in a variable-length
+   array passed to a function.  */
 #ifndef _REGEX_NELTS
 # if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
 	&& !defined __STDC_NO_VLA__)
diff --git a/lib/regexec.c b/lib/regexec.c
index 6923394a08..1f902b1ef6 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -183,9 +183,13 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len);
    Return 0 if a match is found, REG_NOMATCH if not, REG_BADPAT if
    EFLAGS is invalid.  */
 
+/* The declaration of the PMATCH parameter cannot make use of ISO C99
+   variable-length arrays: ISO C23 ?? 6.7.6.2.(5) requires the number of
+   elements to be > 0, but the NMATCH argument is allowed to be 0.  */
+
 int
 regexec (const regex_t *__restrict preg, const char *__restrict string,
-	 size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags)
+	 size_t nmatch, regmatch_t pmatch[/* nmatch */], int eflags)
 {
   reg_errcode_t err;
   Idx start, length;
-- 
2.43.0

Reply via email to