https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118113
Bug ID: 118113 Summary: std::regex construction from string literal causes out-of-bounds access when compiled with O2 and LTO Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: david.cortes.rivera at gmail dot com Target Milestone: --- Forwarding from example by Ivan Krylov here: https://stat.ethz.ch/pipermail/r-package-devel/2024q4/011309.html Creating a regex from a string literal will cause an out-of-bounds access when both -O2 and -flto are used together. It is reproducible with GCC versions 14.2 and 12.2 as far as I can tell. Code to reproduce (courtesy of Ivan Krylov): #include <iostream> #include <regex> int main() { std::string s{" gjdshlkhj \" lsjkhkljh "}; const char * rx = "\""; std::cout << std::regex_replace(s, std::regex(rx), "\\\"") // <-- line 7 << std::endl; // the code below is required for the problem to happen above! for (int i = 0; i < 100; ++i) volatile std::regex rxx(rx); } If compiled as follows: g++ -fsanitize=address -O2 -flto=auto bugged_regex.cpp Then running it will result into the following error message: ================================================================= ==33379==ERROR: AddressSanitizer: global-buffer-overflow on address 0x563bfc3d1482 at pc 0x563bfc37cf37 bp 0x7ffd574b4a70 sp 0x7ffd574b4a68 READ of size 1 at 0x563bfc3d1482 thread T0 #0 0x563bfc37cf36 in std::__detail::_Scanner<char>::_M_advance() (/home/david/c_quicktest/a.out+0x1df36) #1 0x563bfc384aee in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_try_char() (/home/david/c_quicktest/a.out+0x25aee) #2 0x563bfc39838a in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() (/home/david/c_quicktest/a.out+0x3938a) #3 0x563bfc398d40 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() (/home/david/c_quicktest/a.out+0x39d40) #4 0x563bfc3a5a6d in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() (/home/david/c_quicktest/a.out+0x46a6d) #5 0x563bfc3b0c75 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) [clone .constprop.0] (/home/david/c_quicktest/a.out+0x51c75) #6 0x563bfc3b1f78 in std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::_M_compile(char const*, char const*, std::regex_constants::syntax_option_type) [clone .constprop.0] (/home/david/c_quicktest/a.out+0x52f78) #7 0x563bfc36c9e4 in main (/home/david/c_quicktest/a.out+0xd9e4) #8 0x7f9978d85249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #9 0x7f9978d85304 in __libc_start_main_impl ../csu/libc-start.c:360 #10 0x563bfc3723eb (/home/david/c_quicktest/a.out+0x133eb) 0x563bfc3d1482 is located 62 bytes before global variable '*.LC41' defined in './a.ltrans1.ltrans' (0x563bfc3d14c0) of size 145 '*.LC41' is ascii string 'Number of NFA states exceeds limit. Please use shorter regex string, or use smaller brace expression, or make _GLIBCXX_REGEX_STATE_LIMIT larger.' 0x563bfc3d1482 is located 0 bytes after global variable '*.LC40' defined in './a.ltrans1.ltrans' (0x563bfc3d1480) of size 2 '*.LC40' is ascii string '"' SUMMARY: AddressSanitizer: global-buffer-overflow (/home/david/c_quicktest/a.out+0x1df36) in std::__detail::_Scanner<char>::_M_advance() Shadow bytes around the buggy address: 0x563bfc3d1200: 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 02 f9 f9 f9 f9 0x563bfc3d1280: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00 0x563bfc3d1300: 05 f9 f9 f9 f9 f9 f9 f9 00 07 f9 f9 f9 f9 f9 f9 0x563bfc3d1380: 07 f9 f9 f9 f9 f9 f9 f9 00 05 f9 f9 f9 f9 f9 f9 0x563bfc3d1400: 00 f9 f9 f9 f9 f9 f9 f9 00 06 f9 f9 f9 f9 f9 f9 =>0x563bfc3d1480:[02]f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x563bfc3d1500: 00 00 00 00 00 00 00 00 00 00 01 f9 f9 f9 f9 f9 0x563bfc3d1580: 00 00 00 00 00 00 01 f9 f9 f9 f9 f9 00 00 00 00 0x563bfc3d1600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 0x563bfc3d1680: f9 f9 f9 f9 00 00 00 03 f9 f9 f9 f9 00 00 00 01 0x563bfc3d1700: f9 f9 f9 f9 00 00 00 00 00 00 05 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==33379==ABORTING Removing either the -O2 or the -flto makes the problem go away. Note that the ASAN error was originally shown for code that didn't have volatile, in this line of code here: https://github.com/david-cortes/isotree/blob/1f84128a03bb6fc5eecd1de7aebf4b745b54fa1e/src/formatted_exporters.cpp#L332