https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118113

            Bug ID: 118113
           Summary: std::regex construction from string literal causes
                    out-of-bounds access when compiled with O2 and LTO
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: david.cortes.rivera at gmail dot com
  Target Milestone: ---

Forwarding from example by Ivan Krylov here:
https://stat.ethz.ch/pipermail/r-package-devel/2024q4/011309.html

Creating a regex from a string literal will cause an out-of-bounds access when
both -O2 and -flto are used together. It is reproducible with GCC versions 14.2
and 12.2 as far as I can tell.

Code to reproduce (courtesy of Ivan Krylov):
#include <iostream>
#include <regex>
int main() {
 std::string s{" gjdshlkhj \" lsjkhkljh "};
 const char * rx = "\"";
 std::cout
  << std::regex_replace(s, std::regex(rx), "\\\"") // <-- line 7
  << std::endl;
 // the code below is required for the problem to happen above!
 for (int i = 0; i < 100; ++i) volatile std::regex rxx(rx);
}

If compiled as follows:
g++ -fsanitize=address -O2 -flto=auto bugged_regex.cpp

Then running it will result into the following error message:
=================================================================
==33379==ERROR: AddressSanitizer: global-buffer-overflow on address
0x563bfc3d1482 at pc 0x563bfc37cf37 bp 0x7ffd574b4a70 sp 0x7ffd574b4a68
READ of size 1 at 0x563bfc3d1482 thread T0
    #0 0x563bfc37cf36 in std::__detail::_Scanner<char>::_M_advance()
(/home/david/c_quicktest/a.out+0x1df36)
    #1 0x563bfc384aee in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_try_char()
(/home/david/c_quicktest/a.out+0x25aee)
    #2 0x563bfc39838a in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(/home/david/c_quicktest/a.out+0x3938a)
    #3 0x563bfc398d40 in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(/home/david/c_quicktest/a.out+0x39d40)
    #4 0x563bfc3a5a6d in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()
(/home/david/c_quicktest/a.out+0x46a6d)
    #5 0x563bfc3b0c75 in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char
const*, char const*, std::locale const&,
std::regex_constants::syntax_option_type) [clone .constprop.0]
(/home/david/c_quicktest/a.out+0x51c75)
    #6 0x563bfc3b1f78 in std::__cxx11::basic_regex<char,
std::__cxx11::regex_traits<char> >::_M_compile(char const*, char const*,
std::regex_constants::syntax_option_type) [clone .constprop.0]
(/home/david/c_quicktest/a.out+0x52f78)
    #7 0x563bfc36c9e4 in main (/home/david/c_quicktest/a.out+0xd9e4)
    #8 0x7f9978d85249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    #9 0x7f9978d85304 in __libc_start_main_impl ../csu/libc-start.c:360
    #10 0x563bfc3723eb  (/home/david/c_quicktest/a.out+0x133eb)

0x563bfc3d1482 is located 62 bytes before global variable '*.LC41' defined in
'./a.ltrans1.ltrans' (0x563bfc3d14c0) of size 145
  '*.LC41' is ascii string 'Number of NFA states exceeds limit. Please use
shorter regex string, or use smaller brace expression, or make
_GLIBCXX_REGEX_STATE_LIMIT larger.'
0x563bfc3d1482 is located 0 bytes after global variable '*.LC40' defined in
'./a.ltrans1.ltrans' (0x563bfc3d1480) of size 2
  '*.LC40' is ascii string '"'
SUMMARY: AddressSanitizer: global-buffer-overflow
(/home/david/c_quicktest/a.out+0x1df36) in
std::__detail::_Scanner<char>::_M_advance()
Shadow bytes around the buggy address:
  0x563bfc3d1200: 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 02 f9 f9 f9 f9
  0x563bfc3d1280: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x563bfc3d1300: 05 f9 f9 f9 f9 f9 f9 f9 00 07 f9 f9 f9 f9 f9 f9
  0x563bfc3d1380: 07 f9 f9 f9 f9 f9 f9 f9 00 05 f9 f9 f9 f9 f9 f9
  0x563bfc3d1400: 00 f9 f9 f9 f9 f9 f9 f9 00 06 f9 f9 f9 f9 f9 f9
=>0x563bfc3d1480:[02]f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0x563bfc3d1500: 00 00 00 00 00 00 00 00 00 00 01 f9 f9 f9 f9 f9
  0x563bfc3d1580: 00 00 00 00 00 00 01 f9 f9 f9 f9 f9 00 00 00 00
  0x563bfc3d1600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
  0x563bfc3d1680: f9 f9 f9 f9 00 00 00 03 f9 f9 f9 f9 00 00 00 01
  0x563bfc3d1700: f9 f9 f9 f9 00 00 00 00 00 00 05 f9 f9 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==33379==ABORTING

Removing either the -O2 or the -flto makes the problem go away.

Note that the ASAN error was originally shown for code that didn't have
volatile, in this line of code here:
https://github.com/david-cortes/isotree/blob/1f84128a03bb6fc5eecd1de7aebf4b745b54fa1e/src/formatted_exporters.cpp#L332

Reply via email to