https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107167
Bug ID: 107167 Summary: It looks like GCC wastes registers on trivial computations when result can be cached Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: unlvsur at live dot com Target Milestone: --- I do not know whether it is a big issue or not with targets that provide tons of available registers (like aarch64 or loongarch64). However, this looks like a big issue for x86_64 which only provides 16 general purpose registers (plus %rsp is reserved, so 15 available registers) Take the example like this: https://godbolt.org/z/77rEsr1PG #include<bit> unsigned Sigma1(unsigned x) noexcept { return std::rotr(x,6)^std::rotr(x,11)^std::rotr(x,25); } GCC generates code like this to avoid dependencies. Sigma1m(unsigned int): movl %edi, %eax movl %edi, %edx roll $7, %edi rorl $6, %eax rorl $11, %edx xorl %edx, %eax xorl %edi, %eax ret However: mySigma1m(unsigned int): movl %edi, %eax rorl $6, %edi rorl $11, %eax xorl %edi, %eax rorl $19, %edi xorl %edi, %eax ret Saves one register in this task. That becomes a huge problem when tons of computation are involved where registers are in a position of shortage. 1st one also generates 1 more instruction and it can affect the code cache. Aggressively utilizing all registers may not give the best results. Local maximum =/= Global maximum. I don't know.