https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79748

--- Comment #2 from Katsunori Kumatani <katsunori.kumatani at gmail dot com> ---
I tried -O3 -fipa-ra on the following example code but it seems it doesn't do
what I suggested:  (I used inline asm to force it to use a callee-saved
register, no other reason... just to demonstrate)


#include <stdio.h>

static __attribute__((noinline)) int foo(int a)
{
  asm("incl %0":"+b"(a));  // use ebx just to demonstrate
  return a;
}

void bar(int x)
{
  asm("incl %0":"+b"(x));  // in caller as well
  printf("%d", foo(x));
}


And I get this (see comment):


foo(int):
        pushq   %rbx           # saves rbx needlessly
        movl    %edi, %ebx
        incl    %ebx
        movl    %ebx, %eax
        popq    %rbx
        ret

bar(int):
        pushq   %rbx           # because this already saves it and
        movl    %edi, %ebx
        incl    %ebx
        movl    %ebx, %edi
        call    foo(int)
        popq    %rbx           # restores it here... (ABI)
        movl    %eax, %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        jmp     printf


Since GCC knows that 'foo' is "internal" to the code (not externally visible,
its address is not taken, and with LTO it can know even across translation
units), it could optimize this without having to save/restore rbx in 'foo'.
'bar' does the right thing saving 'rbx', but 'foo' doesn't... therefore, this
attribute would be applied to 'foo'.

Now I know that such an optimization might not be easy to add -- that's why I
did not ask for an optimization, but to use the *existing* interprocedural
optimizations of GCC. That attribute would help with that, because then it
won't save/restore rbx in 'foo' (due to the attribute).

Note that GCC's ipa-ra does work good but it needs to have all registers
"unsaved" for that.

For example, if you use 'ecx' in 'bar', it will *not* spill it across the
function 'foo' because it knows 'foo' does not modify / clobber 'rcx' at all.
That's basically what I'd like for 'rbx' and other callee-saved registers.

This attribute would simply give GCC more freedom in this situations. This is
probably more useful for x86-64 because it has more callee-saved registers...
and pushing/popping them everytime in a function has more implications than the
4 used in 32-bit (but of course I'm sure it can be added for i386 too, if it
gets added I mean)... and this happens even if the caller doesn't necessarily
require it.

I suggested it because I figured it would be an easy and useful addition (it
may be useful for more things than just this particular situation). It doesn't
have the problems of other "non general purpose" registers not being saved,
because those don't get saved anyway. (like no_caller_saved_registers suffers
from, which is already added)

Reply via email to