https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107167

            Bug ID: 107167
           Summary: It looks like GCC wastes registers on trivial
                    computations when result can be cached
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: unlvsur at live dot com
  Target Milestone: ---

I do not know whether it is a big issue or not with targets that provide tons
of available registers (like aarch64 or loongarch64). However, this looks like
a big issue for x86_64 which only provides 16 general purpose registers (plus
%rsp is reserved, so 15 available registers)
Take the example like this:

https://godbolt.org/z/77rEsr1PG

#include<bit>

unsigned Sigma1(unsigned x) noexcept
{
    return std::rotr(x,6)^std::rotr(x,11)^std::rotr(x,25);
}


GCC generates code like this to avoid dependencies.
Sigma1m(unsigned int):
        movl    %edi, %eax
        movl    %edi, %edx
        roll    $7, %edi
        rorl    $6, %eax
        rorl    $11, %edx
        xorl    %edx, %eax
        xorl    %edi, %eax
        ret

However:
mySigma1m(unsigned int):
        movl    %edi, %eax
        rorl    $6, %edi
        rorl    $11, %eax
        xorl    %edi, %eax
        rorl    $19, %edi
        xorl    %edi, %eax
        ret

Saves one register in this task. That becomes a huge problem when tons of
computation are involved where registers are in a position of shortage.

1st one also generates 1 more instruction and it can affect the code cache.

Aggressively utilizing all registers may not give the best results. Local
maximum =/= Global maximum.
I don't know.

Reply via email to