https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93040
Bug ID: 93040 Summary: gcc doesn't optimize unaligned accesses to a 16-bit value on the x86 as well as it does a 32-bit value (or clang) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: miles at gnu dot org Target Milestone: --- Given the following code: unsigned short get_unaligned_16 (unsigned char *p) { return p[0] | (p[1] << 8); } unsigned int get_unaligned_32 (unsigned char *p) { return get_unaligned_16 (p) | (get_unaligned_16 (p + 2) << 16); } unsigned int get_unaligned_32_alt (unsigned char *p) { return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24); } ... Clang/LLVM (trunk, but it has the same results many versions back) generates the following very nice output: get_unaligned_16: # @get_unaligned_16 movzx eax, word ptr [rdi] ret get_unaligned_32: # @get_unaligned_32 mov eax, dword ptr [rdi] ret get_unaligned_32_alt: # @get_unaligned_32_alt mov eax, dword ptr [rdi] ret Whereas gcc (trunk but ditto) generates: get_unaligned_16: movzx eax, BYTE PTR [rdi+1] sal eax, 8 mov edx, eax movzx eax, BYTE PTR [rdi] or eax, edx ret get_unaligned_32: movzx eax, BYTE PTR [rdi+3] sal eax, 8 mov edx, eax movzx eax, BYTE PTR [rdi+2] or eax, edx movzx edx, BYTE PTR [rdi+1] sal eax, 16 mov ecx, edx movzx edx, BYTE PTR [rdi] sal ecx, 8 or edx, ecx movzx edx, dx or eax, edx ret get_unaligned_32_alt: mov eax, DWORD PTR [rdi] ret Notice that in the "get_unaligned_32_alt" version, gcc _does_ detect that this is really an unaligned access to a 32-bit integer and reduces it to a single instruction on the x86, as that architecture supports unaligned accesses. However the 16-bit version, "get_unaligned_16", and get_unaligned_32 derived from that, it just uses the component bit-munching operations. It does seem curious that gcc manages the 32-bit case, but fails on the 16-bit case... I tested gcc on godbolt.com, and Clang locally (and on godbolt). Flags used: -O2 -march=skylake -Os and -O3 yield the same results. Versions: gcc (Compiler-Explorer-Build) 10.0.0 20191220 (experimental) clang version 10.0.0 (https://github.com/llvm/llvm-project.git b4dfa74a5d80b3602a5315fac2ef5f98b0e63708)