Since Jim's regex changes from 2026-04-12, the GNU sed continuous integration
fails when compiled with sanitizers: There is a test failure


FAIL: testsuite/dc
==================

../lib/regex_internal.c:1289:7: runtime error: execution reached an unreachable 
program point
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
../lib/regex_internal.c:1289:7 in 
../testsuite/dc.sh: line 53: 14403 Done                    ( echo 2002; cat 
easter.dc )
     14404 Aborted                 (core dumped) | sed -n -f "$dir/dc.sed" > 
easter-out
--- easter-exp  2026-04-19 08:31:22.828781794 +0000
+++ easter-out  2026-04-19 08:31:22.830781802 +0000
@@ -1,2 +0,0 @@
-31
-March 2002


To make things easier to debug with Gnulib alone (without the GNU sed sources),
I've extracted the failing scenario and am committing it here into Gnulib.

The failure is a failed DEBUG_ASSERT (that is not executed in normal builds).

This is *not* a regression caused by Jim's and Paul's commits from last week.
Rather, this assertion failure was already present in the Gnulib code before
these changes. It was just not seen because on recent glibc systems the
autoconfiguration picked the glibc implementation of the regex code. Whereas
now, it picks the Gnulib implementation.

To reproduce the problem:

$ ./gnulib-tool --create-testdir --dir=../testdir --symlink \
  --avoid=memchr-tests --avoid=strncpy-tests regex
$ cd ../testdir

With gcc:
export CC="gcc -fsanitize=undefined -fno-sanitize-recover=undefined"
export CFLAGS="-O1 -fno-omit-frame-pointer -ggdb"

With clang:
export CC="clang 
-fsanitize=undefined,signed-integer-overflow,shift,integer-divide-by-zero 
-fno-sanitize-recover=undefined"
export CFLAGS="-O1 -fno-omit-frame-pointer -ggdb"

$ ./configure
$ make
$ make check

I would suggest to investigate this before the next 'sed' and 'coreutils' 
releases.


2026-04-19  Bruno Haible  <[email protected]>

        regex tests: Add a test case that triggers an assertion failure.
        * tests/test-regex.c (main): Add a test case related to back references.

diff --git a/tests/test-regex.c b/tests/test-regex.c
index d747eefdc7..b7d6aa91a4 100644
--- a/tests/test-regex.c
+++ b/tests/test-regex.c
@@ -505,6 +505,37 @@ main (void)
       }
   }
 
+  /* An assertion failure related to back references, seen in sed's dc.sed.
+     To reproduce, use gcc or clang with UBSAN.  */
+  {
+    const char *addr = "^<\\([^~]*\\)\\([^~]\\)[^~]*~\\1\\(.\\).*|=.*\\3.*\\2";
+    const char *input =
+      "<,1583~,2002~2002~|P|K0|I10|O10|rpddsf[lfp[too early\n"
+      "]Pq]s@1583>@\n"
+      "ddd19%1+sg100/1+d3*4/12-sx8*5+25/5-sz5*4/lx-10-sdlg11*20+lz+lx-30%\n"
+      
"d[30+]s@0>@d[[1+]s@lg11<@]s@25=@d[1+]s@24=@se44le-d[30+]s@21>@dld+7%-7+\n"
+      "[March ]smd[31-[April ]sm]s@31<@psnlmPpsn1z>p~|rf2002~|r@lfp[too 
early\n"
+      "]Pq~|?>@\n"
+      "ddd19%1+sg100/1+d3*4/12-sx8*5+25/5-sz5*4/lx-10-sdlg11*20+lz+lx-30%\n"
+      
"d[30+]s@0>@d[[1+]s@lg11<@]s@25=@d[1+]s@24=@se44le-d[30+]s@21>@dld+7%-7+\n"
+      "[March ]smd[31-[April ]sm]s@31<@psnlmPpsn1z>p~|=-~.0,123456789<><";
+    re_set_syntax (RE_NO_SUB | RE_NO_POSIX_BACKTRACKING | RE_NO_EMPTY_RANGES
+                   | RE_INTERVALS | RE_DOT_NEWLINE | RE_CHAR_CLASSES
+                   | RE_BK_PLUS_QM);
+    regex_t regex;
+    memset (&regex, 0, sizeof regex);
+    const char *errmsg = re_compile_pattern (addr, strlen (addr), &regex);
+    if (errmsg)
+      report_error ("dc.sed preparation failure: %s", errmsg);
+    else
+      {
+        regex.translate = NULL;
+        regoff_t ret = re_search (&regex, input, strlen (input),
+                                  0, strlen (input), NULL);
+        (void) ret;
+      }
+  }
+
 #if 0
   /* It would be nice to reject hosts whose regoff_t values are too
      narrow (including glibc on hosts with 64-bit ptrdiff_t and




Reply via email to