https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112683

            Bug ID: 112683
           Summary: Optimizing memcpy range by extending to word bounds
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the minimized source code from libstdc++

```
struct string {
    unsigned long _M_string_length;
    enum { _S_local_capacity = 15 };
    char _M_local_buf[_S_local_capacity + 1];
};

string copy(const string& __str) noexcept {
    string result;

    if (__str._M_string_length > __str._S_local_capacity)
        __builtin_unreachable();

    result._M_string_length = __str._M_string_length;
    __builtin_memcpy(result._M_local_buf, __str._M_local_buf,
                     __str._M_string_length + 1);

    return result;
}
```

Right now GCC with -O2 emits a long assembly with ~50 instructions
https://godbolt.org/z/a89bh17hd

However, note that
* the `result._M_local_buf` is uninitialized,
* there's at most 16 bytes to copy to `result._M_local_buf` which is of size 16
bytes

So the compiler could optimize the code to always copy 16 bytes. The behavior
change is not observable by user as the uninitialized bytes could contain any
data, including the same bytes as `_str._M_local_buf`.

As a result of always copying 16 bytes, the assembly becomes more than 7 times
shorter, conditional jumps go away: https://godbolt.org/z/r5GPYTs4Y

Reply via email to