https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103550

--- Comment #6 from cqwrteur <unlvsur at live dot com> ---
(In reply to Andrew Pinski from comment #5)
> (In reply to cqwrteur from comment #4)
> > (In reply to Andrew Pinski from comment #2)
> > > Looks like it is a register allocation/scheduling issue. The extra
> > > instructions are mov.
> > 
> > Are there good algos that can allocate registers optimal?
> 
> note the move instructions might be "free" on most modern x86 machine, it
> just takes up icache space and decode time.
> having so little registers and having a 2 operand instruction set makes
> register allocation a hard problem really. Yes LLVM might get it right in
> this testcase but there are others where GCC might do a better job.

I know. I am just investigating why compilers generate lesser optimal assembly
than openssl for sha512.

https://github.com/tearosccebe/fast_io/blob/988d75ddb4af7c745df97124a6f3d1842936bfa3/include/fast_io_crypto/hash/sha512_scalar.h#L20

One round GCC would generate 55 instructions while OpenSSL only needs 47
instructions. The performance difference is quite noticeable since more
register allocations here might add more trivial load/store to memory for
saving temporaries.

Reply via email to