[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

peter at cordes dot ca Tue, 24 Oct 2017 03:01:58 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82680


--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
gcc's sequence is *probably* good, as long as it uses xor / comisd / setcc and
not comisd / setcc / movzx (which gcc often likes to do for integer setcc).

(u)comisd and cmpeqsd both run on the FP add unit.  Agner Fog doesn't list the
latency.  (It's hard to measure, because you'd need to construct a round-trip
back to FP.)  XOR-zeroing is as cheap as a NOP on Intel SnB-family, but uses an
execution port on AMD, so gcc's sequence is the same front-end uops but fewer
unfused-domain uops for the execution units on SnB.  Also, the xor-zeroing is
off the critical path on all CPUs.  (But ucomisd latency is probably as high as
cmpeqsd + movd).

Hmm, AMD bdver* and Ryzen take 2 uops for comisd, so for tune=generic it's
probably worth thinking about using ICC's sequence.

ICC's sequence is especially good if you're doing something with the integer
result that can optimize away the NEG.  (e.g. use it with AND instead of a CMOV
to conditionally zero something, or AND it with another condition).  Or if
you're storing the boolean result to memory, psrld $31, %xmm0 or PAND, then
movd directly to memory without going through integer regs.


comisd doesn't destroy either of its args, but cmpeqsd does (without AVX).  If
you want both x and y afterwards (e.g. if they weren't equal, or you care about
-0.0 and +0.0 being different even though they compare equal), then comisd is a
win.

So I think we need to look at the choices given some more surrounding code.

I'll hopefully look at this some more soon.

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

Reply via email to