https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391
Gabriel Ravier <gabravier at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Failure to optimize 2 8-bit |Failure to optimize |loads into a single 16-bit |adjacent 8-bit loads into a |load |single bigger load --- Comment #1 from Gabriel Ravier <gabravier at gmail dot com> --- Note: this also equivalently works on bigger sizes: uint32_t HeaderReadU32LE(int offset, uint8_t *RomHeader) { return RomHeader[offset] | (RomHeader[offset + 1] << 8) | (RomHeader[offset + 2] << 16) | (RomHeader[offset + 3] << 24); } On AMD64, GCC outputs this: HeaderReadU32LE: movsx rdi, edi movzx eax, BYTE PTR [rsi+1+rdi] movzx edx, BYTE PTR [rsi+2+rdi] sal eax, 8 sal edx, 16 or eax, edx movzx edx, BYTE PTR [rsi+rdi] or eax, edx movzx edx, BYTE PTR [rsi+3+rdi] sal edx, 24 or eax, edx ret LLVM manages this: HeaderReadU32LE: movsxd rax, edi mov eax, dword ptr [rsi + rax] ret