https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #22 from Daniel Elliott <cpphackster at gmail dot com> ---
(In reply to Marc Glisse from comment #21)
> (In reply to Daniel Elliott from comment #20)
> > still clang is 1.64x faster. had a look at the assembly. My limited
> > understanding makes me think that the ucomiss is not fully vectorized and
> > the clang one is (clangs ucomiss %xmm0,%xmm1 vs gcc's ucomiss
> > 0x218b4(%rip),%xmm0). Feel free to correct me if I am wrong.
> 
> Nothing gets vectorized (likely because of the "dontoptimize" code). The
> ucomiss difference is that llvm keeps the constant .5f in a register, while
> gcc reloads it every time. I don't know if the speed difference comes from
> that, or from some subtle tuning arrangement of the operations (I didn't try
> to understand why llvm has 4 mov where gcc has only 2).

Right I thought because it was an xmm0 that means vector register. I'm going to
go and read up on assembly!

Reply via email to