https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #21 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Daniel Elliott from comment #20)
> still clang is 1.64x faster. had a look at the assembly. My limited
> understanding makes me think that the ucomiss is not fully vectorized and
> the clang one is (clangs ucomiss %xmm0,%xmm1 vs gcc's ucomiss
> 0x218b4(%rip),%xmm0). Feel free to correct me if I am wrong.

Nothing gets vectorized (likely because of the "dontoptimize" code). The
ucomiss difference is that llvm keeps the constant .5f in a register, while gcc
reloads it every time. I don't know if the speed difference comes from that, or
from some subtle tuning arrangement of the operations (I didn't try to
understand why llvm has 4 mov where gcc has only 2).

Reply via email to