https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

            Bug ID: 97887
           Summary: Failure to optimize neg plus div to avoid using x87
                    floating point stack
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

float f(float a)
{
    return -a / a;
}

On x86 -O3, LLVM outputs this:

.LCPI0_0:
  .long 0x80000000 # float -0
  .long 0x80000000 # float -0
  .long 0x80000000 # float -0
  .long 0x80000000 # float -0
f(float):
  movaps xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 =
[-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
  xorps xmm1, xmm0
  divss xmm1, xmm0
  movaps xmm0, xmm1
  ret

GCC outputs this:

f(float):
  movss DWORD PTR [rsp-4], xmm0
  fld DWORD PTR [rsp-4]
  movaps xmm1, xmm0
  fchs
  fstp DWORD PTR [rsp-4]
  movss xmm0, DWORD PTR [rsp-4]
  divss xmm0, xmm1
  ret

I'm *pretty sure* that loading the value into the x87 stack (especially mixed
with SSE instructions) is much slower than using SSE instructions for this.

Reply via email to