https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93040

            Bug ID: 93040
           Summary: gcc doesn't optimize unaligned accesses to a 16-bit
                    value on the x86 as well as it does a 32-bit value (or
                    clang)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: miles at gnu dot org
  Target Milestone: ---

Given the following code:

    unsigned short get_unaligned_16 (unsigned char *p)
    {
        return p[0] | (p[1] << 8);
    }

    unsigned int get_unaligned_32 (unsigned char *p)
    {
        return get_unaligned_16 (p) | (get_unaligned_16 (p + 2) << 16);
    }

    unsigned int get_unaligned_32_alt (unsigned char *p)
    {
        return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24);
    }


... Clang/LLVM (trunk, but it has the same results many versions back)
generates the following very nice output:

    get_unaligned_16:                       # @get_unaligned_16
            movzx   eax, word ptr [rdi]
            ret
    get_unaligned_32:                       # @get_unaligned_32
            mov     eax, dword ptr [rdi]
            ret
    get_unaligned_32_alt:                   # @get_unaligned_32_alt
            mov     eax, dword ptr [rdi]
            ret


Whereas gcc (trunk but ditto) generates:

    get_unaligned_16:
            movzx   eax, BYTE PTR [rdi+1]
            sal     eax, 8
            mov     edx, eax
            movzx   eax, BYTE PTR [rdi]
            or      eax, edx
            ret
    get_unaligned_32:
            movzx   eax, BYTE PTR [rdi+3]
            sal     eax, 8
            mov     edx, eax
            movzx   eax, BYTE PTR [rdi+2]
            or      eax, edx
            movzx   edx, BYTE PTR [rdi+1]
            sal     eax, 16
            mov     ecx, edx
            movzx   edx, BYTE PTR [rdi]
            sal     ecx, 8
            or      edx, ecx
            movzx   edx, dx
            or      eax, edx
            ret
    get_unaligned_32_alt:
            mov     eax, DWORD PTR [rdi]
            ret


Notice that in the "get_unaligned_32_alt" version, gcc _does_ detect
that this is really an unaligned access to a 32-bit integer and
reduces it to a single instruction on the x86, as that architecture
supports unaligned accesses.

However the 16-bit version, "get_unaligned_16", and get_unaligned_32
derived from that, it just uses the component bit-munching operations.

It does seem curious that gcc manages the 32-bit case, but fails on
the 16-bit case...

I tested gcc on godbolt.com, and Clang locally (and on godbolt).

Flags used:

   -O2 -march=skylake

-Os and -O3 yield the same results.

Versions:

   gcc (Compiler-Explorer-Build) 10.0.0 20191220 (experimental)
   clang version 10.0.0 (https://github.com/llvm/llvm-project.git
b4dfa74a5d80b3602a5315fac2ef5f98b0e63708)

Reply via email to