https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80586
Bug ID: 80586 Summary: vsqrtss with AVX should avoid a dependency on the destination register. Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: x86_64-*-*, i?86-*-* #include <math.h> float sqrt_depcheck(float a, float b) { return sqrtf(b); } compiles to (with gcc 8.0.0 20170429 -march=haswell -O3 -fno-math-errno): vsqrtss %xmm1, %xmm0, %xmm0 ret recent clang (4.0) avoids the unwanted dependency on %xmm0 by using the source register as *both* source operands: vsqrtss %xmm1, %xmm1, %xmm0 ret This of course doesn't work when the source is a different type (e.g. memory, or for int->float conversion, an integer register. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80571 for a suggestion to track cold registers that can be safely used read-only without delaying OOO execution, without putting vxorps-zeroing everywhere). float sqrt_from_mem(float *fp) { return sqrtf(*fp); } ICC17 breaks the dep on xmm0 this way: vmovss (%rdi), %xmm0 #8.12 vsqrtss %xmm0, %xmm0, %xmm0 #8.12 gcc and clang both decide to risk it with: vsqrtss (%rdi), %xmm0, %xmm0 code on https://godbolt.org/g/mJmjdh.