https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112683
Bug ID: 112683 Summary: Optimizing memcpy range by extending to word bounds Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the minimized source code from libstdc++ ``` struct string { unsigned long _M_string_length; enum { _S_local_capacity = 15 }; char _M_local_buf[_S_local_capacity + 1]; }; string copy(const string& __str) noexcept { string result; if (__str._M_string_length > __str._S_local_capacity) __builtin_unreachable(); result._M_string_length = __str._M_string_length; __builtin_memcpy(result._M_local_buf, __str._M_local_buf, __str._M_string_length + 1); return result; } ``` Right now GCC with -O2 emits a long assembly with ~50 instructions https://godbolt.org/z/a89bh17hd However, note that * the `result._M_local_buf` is uninitialized, * there's at most 16 bytes to copy to `result._M_local_buf` which is of size 16 bytes So the compiler could optimize the code to always copy 16 bytes. The behavior change is not observable by user as the uninitialized bytes could contain any data, including the same bytes as `_str._M_local_buf`. As a result of always copying 16 bytes, the assembly becomes more than 7 times shorter, conditional jumps go away: https://godbolt.org/z/r5GPYTs4Y