https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887
Bug ID: 97887 Summary: Failure to optimize neg plus div to avoid using x87 floating point stack Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- float f(float a) { return -a / a; } On x86 -O3, LLVM outputs this: .LCPI0_0: .long 0x80000000 # float -0 .long 0x80000000 # float -0 .long 0x80000000 # float -0 .long 0x80000000 # float -0 f(float): movaps xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0] xorps xmm1, xmm0 divss xmm1, xmm0 movaps xmm0, xmm1 ret GCC outputs this: f(float): movss DWORD PTR [rsp-4], xmm0 fld DWORD PTR [rsp-4] movaps xmm1, xmm0 fchs fstp DWORD PTR [rsp-4] movss xmm0, DWORD PTR [rsp-4] divss xmm0, xmm1 ret I'm *pretty sure* that loading the value into the x87 stack (especially mixed with SSE instructions) is much slower than using SSE instructions for this.