Vladimir,
this patch adds analysis of register usage of functions for usage by IRA.
The patch:
- adds analysis in pass_final to track which hard registers are set or clobbered
by the function body, and stores that information in a struct cgraph_node.
- adds a target hook fn_other_hard_reg_usage to list hard registers that are
set or clobbered by a call to a function, but are not listed as such in the
function body, such as f.i. registers clobbered by veneers inserted by the
linker.
- adds a reg-note REG_CALL_DECL, to be able to easily link call_insns to their
corresponding declaration, even after the calls may have been split into an
insn (set register to function address) and a call_insn (call register),
which
can happen for f.i. sh, and mips with -mabi-calls.
- uses the register analysis in IRA.
- adds an option -fuse-caller-save to control the optimization, on by default
at -Os and -O2 and higher.
The patch (original version by Radovan Obradovic) is similar to your patch
( http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01625.html ) from 2007.
But this patch doesn't implement save area stack slot sharing.
( Btw, I've borrowed the struct cgraph_node field name and comment from the 2007
patch ).
[ Steven, you mentioned in this discussion
( http://gcc.gnu.org/ml/gcc/2012-10/msg00213.html ) that you are working on
porting the 2007 patch to trunk. What is the status of that effort?
]
As an example of the functionality, consider foo and bar from test-case aru-1.c:
...
static int __attribute__((noinline))
bar (int x)
{
return x + 3;
}
int __attribute__((noinline))
foo (int y)
{
return y + bar (y);
}
...
Compiled at -O2, bar only sets register $2 (the first return register):
...
bar:
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
j $31
addiu $2,$4,3
...
foo then can use register $3 (the second return register) instead of register
$16 to save the value in register $4 (the first argument register) over the
call, as demonstrated here in a -fno-use-caller-save vs. -fuse-caller-save diff:
...
foo: foo:
# vars= 0, regs= 2/0, args= 16, gp= 8 | # vars= 0, regs= 1/0, args= 16, gp= 8
.frame $sp,32,$31 .frame $sp,32,$31
.mask 0x80010000,-4 | .mask 0x80000000,-4
.fmask 0x00000000,0 .fmask 0x00000000,0
.set noreorder .set noreorder
.set nomacro .set nomacro
addiu $sp,$sp,-32 addiu $sp,$sp,-32
sw $31,28($sp) sw $31,28($sp)
sw $16,24($sp) <
.option pic0 .option pic0
jal bar jal bar
.option pic2 .option pic2
move $16,$4 | move $3,$4
lw $31,28($sp) lw $31,28($sp)
addu $2,$2,$16 | addu $2,$2,$3
lw $16,24($sp) <
j $31 j $31
addiu $sp,$sp,32 addiu $sp,$sp,32
...
That way we skip the save and restore of register $16, which is not necessary
for $3. Btw, a further improvement could be to reuse $4 after the call, and
eliminate the move.
A version of this patch on top of 4.6 ran into trouble with the epilogue on arm,
where a register was clobbered by a stack pop instruction, while that was not
visible in the rtl representation. This instruction was introduced in
arm_output_epilogue by code marked with the comment 'pop call clobbered
registers if it avoids a separate stack adjustment'.
I cannot reproduce that issue on trunk. Looking at the generated rtl, it seems
that the epilogue instructions now list all registers set by it, so
collect_fn_hard_reg_usage is able to analyze all clobbered registers.
Bootstrapped and reg-tested on x86_64, Ada inclusive. Build and reg-tested on
mips, arm, ppc and sh. No issues found. OK for stage1 trunk?