https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

            Bug ID: 94837
           Summary: Failure to optimize out spurious movbe into bswap
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

float swapFloat(float x)
{
    union
    {
        float f;
        uint32_t u32;
    } swapper;

    swapper.f = x;
    swapper.u32 = __builtin_bswap32(swapper.u32);
    return swapper.f;
}

For this function, on x86-64 with `-O3 -mmovbe`, LLVM outputs this : 

swapFloat(float): # @swapFloat(float)
  movd eax, xmm0
  bswap eax
  movd xmm0, eax
  ret

GCC instead outputs this :

swapFloat(float):
  movd DWORD PTR [rsp-4], xmm0
  movbe eax, DWORD PTR [rsp-4]
  movd xmm0, eax
  ret

It seems highly likely to me that a spill to memory is much slower than a
direct `bswap`.

Reply via email to