https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391
Gabriel Ravier <gabravier at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Failure to optimize 2 8-bit |Failure to optimize
|loads into a single 16-bit |adjacent 8-bit loads into a
|load |single bigger load
--- Comment #1 from Gabriel Ravier <gabravier at gmail dot com> ---
Note: this also equivalently works on bigger sizes:
uint32_t HeaderReadU32LE(int offset, uint8_t *RomHeader)
{
return RomHeader[offset] |
(RomHeader[offset + 1] << 8) |
(RomHeader[offset + 2] << 16) |
(RomHeader[offset + 3] << 24);
}
On AMD64, GCC outputs this:
HeaderReadU32LE:
movsx rdi, edi
movzx eax, BYTE PTR [rsi+1+rdi]
movzx edx, BYTE PTR [rsi+2+rdi]
sal eax, 8
sal edx, 16
or eax, edx
movzx edx, BYTE PTR [rsi+rdi]
or eax, edx
movzx edx, BYTE PTR [rsi+3+rdi]
sal edx, 24
or eax, edx
ret
LLVM manages this:
HeaderReadU32LE:
movsxd rax, edi
mov eax, dword ptr [rsi + rax]
ret