https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79507
Bug ID: 79507 Summary: Incorrect array item inlining when ASAN is enabled Product: gcc Version: lto Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zherczeg at inf dot u-szeged.hu Target Milestone: --- Hi, Tl;DR inlined address of a static array item is invalid. GCC version: gcc-5 (Ubuntu 5.4.1-2ubuntu1~14.04) 5.4.1 20160904 First, you need a jerryscript: https://github.com/jerryscript-project/jerryscript Checkout commit 66683e5d4b3d9474e86900b86be29105524b740c Compile (note: LTO enabled): tools/build.py --clean --compile-flag="-fsanitize=address -m32 -fno-omit-frame-pointer -fno-common -g" --linker-flag=-fsanitize=address --jerry-libc=off --static-link=off --strip=off --system-allocator=on Put the following into a test file (e.g. test.js): ''.replace(/^/g, 'b') build/bin/jerry test.js Result: ASAN error What happens? There is a function in lit-magic-strings.c which returns with a string based on an ID. The strings are stored in a static const array. const lit_utf8_byte_t * lit_get_magic_string_utf8 (lit_magic_string_id_t id) /**< magic string id */ { static const lit_utf8_byte_t * const lit_magic_strings[] JERRY_CONST_DATA = { #define LIT_MAGIC_STRING_FIRST_STRING_WITH_SIZE(size, id) #define LIT_MAGIC_STRING_DEF(id, utf8_string) \ (const lit_utf8_byte_t *) utf8_string, #include "lit-magic-strings.inc.h" #undef LIT_MAGIC_STRING_DEF #undef LIT_MAGIC_STRING_FIRST_STRING_WITH_SIZE }; JERRY_ASSERT (id < LIT_MAGIC_STRING__COUNT); return lit_magic_strings[id]; } /* lit_get_magic_string_utf8 */ The compiler tries to inline this function, which is obviously clever. In ecma_regexp_exec_helper (ecma-regexp-object.c) you can find the following code: ECMA_STRING_TO_UTF8_STRING (input_string_p, input_buffer_p, input_buffer_size); if (input_buffer_size == 0u) { input_curr_p = lit_get_magic_string_utf8 (LIT_MAGIC_STRING__EMPTY); } else { input_curr_p = input_buffer_p; } In case of the example above, the string is empty, so input_buffer_size == 0, and input_curr_p is loaded by the following instruction: 0x08060528 <ecma_regexp_exec_helper+333>: mov $0x80ba160,%esi 0x0806052d <ecma_regexp_exec_helper+338>: mov %eax,%edx (not relevant, instruction scheduling) 0x0806052f <ecma_regexp_exec_helper+340>: mov %eax,%edi (not relevant, instruction scheduling) 0x08060531 <ecma_regexp_exec_helper+342>: test %ecx,%ecx 0x08060533 <ecma_regexp_exec_helper+344>: cmovne -0x188(%ebp),%esi So input_curr_p receives the 0x80ba160 value. This value MUST be the same as input_buffer_p, but they are not when these compiler options are used. The ecma_string_raw_chars function calls lit_get_magic_string_utf8 but with an indirect id. 0x080761c8 <ecma_string_raw_chars+466>: lea 0x80b1700(,%edx,4),%edi (gdb) x 0x80b1700 0x80b1700 <lit_magic_strings.3362.9502>: 0x080ad940 As you can see the first item (LIT_MAGIC_STRING__EMPTY is equals to 0) of the array is 0x080ad940. Because 0x80ba160 != 0x080ad940 the code crashes later.