https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Gabriel Ravier <gabravier at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Failure to optimize 2 8-bit |Failure to optimize
                   |loads into a single 16-bit  |adjacent 8-bit loads into a
                   |load                        |single bigger load

--- Comment #1 from Gabriel Ravier <gabravier at gmail dot com> ---
Note: this also equivalently works on bigger sizes:

uint32_t HeaderReadU32LE(int offset, uint8_t *RomHeader)
{
    return RomHeader[offset] |
        (RomHeader[offset + 1] << 8) |
        (RomHeader[offset + 2] << 16) |
        (RomHeader[offset + 3] << 24);
}

On AMD64, GCC outputs this:

HeaderReadU32LE:
  movsx rdi, edi
  movzx eax, BYTE PTR [rsi+1+rdi]
  movzx edx, BYTE PTR [rsi+2+rdi]
  sal eax, 8
  sal edx, 16
  or eax, edx
  movzx edx, BYTE PTR [rsi+rdi]
  or eax, edx
  movzx edx, BYTE PTR [rsi+3+rdi]
  sal edx, 24
  or eax, edx
  ret

LLVM manages this:

HeaderReadU32LE:
  movsxd rax, edi
  mov eax, dword ptr [rsi + rax]
  ret

Reply via email to