https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103550
--- Comment #6 from cqwrteur <unlvsur at live dot com> --- (In reply to Andrew Pinski from comment #5) > (In reply to cqwrteur from comment #4) > > (In reply to Andrew Pinski from comment #2) > > > Looks like it is a register allocation/scheduling issue. The extra > > > instructions are mov. > > > > Are there good algos that can allocate registers optimal? > > note the move instructions might be "free" on most modern x86 machine, it > just takes up icache space and decode time. > having so little registers and having a 2 operand instruction set makes > register allocation a hard problem really. Yes LLVM might get it right in > this testcase but there are others where GCC might do a better job. I know. I am just investigating why compilers generate lesser optimal assembly than openssl for sha512. https://github.com/tearosccebe/fast_io/blob/988d75ddb4af7c745df97124a6f3d1842936bfa3/include/fast_io_crypto/hash/sha512_scalar.h#L20 One round GCC would generate 55 instructions while OpenSSL only needs 47 instructions. The performance difference is quite noticeable since more register allocations here might add more trivial load/store to memory for saving temporaries.