Hello all,
I suspect there is an uninitialized memory access deep inside
regex_internal.c under very particular circumstances.
This was first reported by "project-repo <b...@feusi.co>"
as part of his fuzzing efforts, here:
https://lists.gnu.org/r/sed-devel/2018-08/msg00017.html
I've been able to pinpoint the cause, but I'm still learning
the code so can't suggest a fix yet.
The offending combination:
1. UTF-8 locale
2. case insensitive regex (REG_ICASE)
3. Using gnulib's regex even if on a glibc system
( --with-included-regex )
4. _REGEX_LARGE_OFFSETS=1 in config.h, causing "regoff_t" to be
ssize_t instead of int.
5. regex containing a valid multibyte character whose
uppercase is difference, resulting in "re_string_t->offsets_needed=1"
6. regex containing backslash-NUL.
Then the problem is:
1. build_wcs_upper_buffer() allocates the 'offsets' member
of a "re_string_t" but does not initialize all elements.
2. re_string_peek_byte_case() accesses an uninitialized element.
Steps to reproduce it reliably are below.
"--with-included-regex" is needed to force using gnulib,
and to force _REGEX_LARGE_OFFSETS to be 1 (bug does not happen without
it).
The tiny patch just adds memory initialization to 0xBC to ensure
the mempory access triggers a segfault.
git clone git://git.sv.gnu.org/sed.git
cd sed
./bootstrap
patch -p1 < regex-int-add-memset.patch
./configure --with-included-regex CFLAGS="-O0 -g"
make
printf "/\xe1\xbe\xbe\x5c\x00/I" > 1.sed
sed/sed -f 1.sed < /dev/null
With valgrind:
====
==29631== Memcheck, a memory error detector
==29631== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==29631== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for
copyright info
==29631== Command: sed/sed -f 1.sed
==29631==
==29631== Invalid read of size 1
==29631== at 0x122403: re_string_peek_byte_case (regex_internal.c:860)
==29631== by 0x127FD0: peek_token (regcomp.c:1830)
==29631== by 0x127E93: fetch_token (regcomp.c:1790)
==29631== by 0x129605: parse_expression (regcomp.c:2459)
==29631== by 0x128CE8: parse_branch (regcomp.c:2221)
==29631== by 0x128AFB: parse_reg_exp (regcomp.c:2173)
==29631== by 0x1289DE: parse (regcomp.c:2141)
==29631== by 0x12573F: re_compile_internal (regcomp.c:803)
==29631== by 0x12473D: rpl_re_compile_pattern (regcomp.c:230)
==29631== by 0x111A57: compile_regex_1 (regexp.c:113)
==29631== by 0x111CD4: compile_regex (regexp.c:190)
==29631== by 0x10C73C: compile_address (compile.c:953)
==29631== Address 0xbcbcbcbcc21b51e6 is not stack'd, malloc'd or
(recently) free'd
==29631==
==29631==
==29631== Process terminating with default action of signal 11 (SIGSEGV)
==29631== General Protection Fault
==29631== at 0x122403: re_string_peek_byte_case (regex_internal.c:860)
==29631== by 0x127FD0: peek_token (regcomp.c:1830)
==29631== by 0x127E93: fetch_token (regcomp.c:1790)
==29631== by 0x129605: parse_expression (regcomp.c:2459)
==29631== by 0x128CE8: parse_branch (regcomp.c:2221)
==29631== by 0x128AFB: parse_reg_exp (regcomp.c:2173)
==29631== by 0x1289DE: parse (regcomp.c:2141)
==29631== by 0x12573F: re_compile_internal (regcomp.c:803)
==29631== by 0x12473D: rpl_re_compile_pattern (regcomp.c:230)
==29631== by 0x111A57: compile_regex_1 (regexp.c:113)
==29631== by 0x111CD4: compile_regex (regexp.c:190)
==29631== by 0x10C73C: compile_address (compile.c:953)
==29631==
==29631== HEAP SUMMARY:
==29631== in use at exit: 8,094 bytes in 16 blocks
==29631== total heap usage: 54 allocs, 38 frees, 16,395 bytes allocated
==29631==
==29631== LEAK SUMMARY:
==29631== definitely lost: 0 bytes in 0 blocks
==29631== indirectly lost: 0 bytes in 0 blocks
==29631== possibly lost: 0 bytes in 0 blocks
==29631== still reachable: 8,094 bytes in 16 blocks
==29631== suppressed: 0 bytes in 0 blocks
==29631== Rerun with --leak-check=full to see details of leaked memory
==29631==
==29631== For counts of detected and suppressed errors, rerun with: -v
==29631== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
====
With GDB, breaking at "re_string_peek_byte_case"
and checking the upper-to-lower offsets, the last 3 elements
are clearly not initialized (they contain "0xBC").
===
$ gdb sed/sed
(gdb) b re_string_peek_byte_case
Breakpoint 1 at 0x1a2d1: file lib/regex_internal.c, line 845.
(gdb) r -f 1.sed < /dev/null
Starting program: /tmp/sed/sed/sed -f 1.sed < /dev/null
Breakpoint 1, re_string_peek_byte_case (pstr=0x7fffffffdc30, idx=1) at
lib/regex_internal.c:845
845 if (BE (!pstr->mbs_allocated, 1))
(gdb) x /6xg pstr->offsets
0x55555578fef0: 0x0000000000000000 0x0000000000000001
0x55555578ff00: 0x0000000000000003 0xbcbcbcbcbcbcbcbc
0x55555578ff10: 0xbcbcbcbcbcbcbcbc 0xbcbcbcbcbcbcbcbc
(gdb)
===
Interestingly, if sed is compiled with native glibc's regex code,
and with _REGEX_LARGE_OFFSETS not defined (meaning "regoff_t" is int),
the "offsets" elements are initialized correctly:
====
$ ./configure CFLAGS="-O0 -g"
$ make
$ gdb sed/sed
(gdb) b re_string_peek_byte_case
Function "re_string_peek_byte_case" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (re_string_peek_byte_case) pending.
(gdb) r -f 1.sed < /dev/null
Starting program: /tmp/sed/sed/sed -f 1.sed < /dev/null
Breakpoint 1, peek_token (token=token@entry=0x7fffffffdc50,
input=input@entry=0x7fffffffdc70,
syntax=syntax@entry=54854214) at regcomp.c:1796
1796 regcomp.c: No such file or directory.
(gdb) s
re_string_peek_byte_case (idx=1, pstr=0x7fffffffdc70) at
regex_internal.c:840
840 regex_internal.c: No such file or directory.
(gdb) x /10xw pstr->offsets
0x555555777e30: 0x00000000 0x00000001 0x00000003 0x00000004
0x555555777e40: 0x00000000 0x00000000 0x000003d1 0x00000000
0x555555777e50: 0x00000000 0x00000000
====
I don't have a fix yet, but if anyone has ideas - feedback is welcomed.
I found one very old glibc bug report from 2005 where Paul mentions
REGEX_LARGE_OFFSET, not sure if relevant or not:
https://sourceware.org/bugzilla/show_bug.cgi?format=multiple&id=1281
regards,
- assaf
diff --git a/lib/regex_internal.c b/lib/regex_internal.c
index 7f0083b91..0b5f47317 100644
--- a/lib/regex_internal.c
+++ b/lib/regex_internal.c
@@ -409,6 +409,7 @@ build_wcs_upper_buffer (re_string_t *pstr)
if (pstr->offsets == NULL)
{
pstr->offsets = re_malloc (Idx, pstr->bufs_len);
+ memset (pstr->offsets, 0xBC, sizeof(Idx)*pstr->bufs_len);
if (pstr->offsets == NULL)
return REG_ESPACE;