On 2019-09-30 10:40 a.m., Richard Sandiford wrote:
IRA's make_early_clobber_and_input_conflicts checks for cases in
which an output operand is likely to be an earlyclobber and an input
operand is unlikely to be tieable with it. If so, the allocno for
the output conflicts with the allocno for the input. This seems
to work well.
However, a similar situation arises if an output operand is likely
to be tied to one of a set of input operands X and if another input
operand has a different value from all of the operands in X.
E.g. if we have:
0: "=r, r"
1: "0, r"
2: "r, 0"
3: "r, r"
operand 0 will always be tied to operand 1 or operand 2, so if operand 3
is different from them both, operand 0 acts like an earlyclobber as far
as operand 3 (only) is concerned. The same is true for operand 2 in:
0: "=r"
1: "0"
2: "r"
In the second example, we'd normally have a copy between operand 1 and
operand 0 if operand 1 dies in the instruction, and so there's rarely
a problem. But if operand 1 doesn't die in the instruction, operand 0
still acts as an earlyclobber for operand 2 (if different from operand 1),
since in that case LRA must copy operand 1 to operand 0 before the
instruction.
As the existing comment says:
Avoid introducing unnecessary conflicts by checking classes of the
constraints and pseudos because otherwise significant code
degradation is possible for some targets.
I think that's doubly true here. E.g. it's perfectly reasonable to have
constraints like:
0: "=r, r"
1: "0, r"
2: "r, r"
on targets like s390 that have shorter instructions for tied operands,
but that don't want the size difference to influence RA too much.
We shouldn't treat operand 0 as earlyclobber wrt operand 2 in that case.
This patch therefore treats a normal tied non-earlyclobber output as
being effectively earlyclobber wrt to an input if it is so for *all*
preferred alternatives.
My usual bogo-comparison of gcc.c-torture, gcc.dg and g++.dg
(this time using -Os -fno-schedule-insns{,2}) gives:
Target Tests Delta Best Worst Median
====== ===== ===== ==== ===== ======
aarch64-linux-gnu 3 -3 -1 -1 -1
aarch64_be-linux-gnu 4 -4 -1 -1 -1
alpha-linux-gnu 136 -190 -56 84 -1
arc-elf 31 -172 -27 3 -2
arm-linux-gnueabi 59 -996 -136 4 -1
arm-linux-gnueabihf 59 -996 -136 4 -1
bfin-elf 22 -31 -19 8 -1
bpf-elf 276 -388 -191 12 -1
cris-elf 73 69 -18 26 -1
epiphany-elf 58 -91 -10 2 -1
fr30-elf 123 -156 -33 20 -1
h8300-elf 150 -426 -36 17 -2
hppa64-hp-hpux11.23 39 -65 -16 1 -1
i686-apple-darwin 93 -51 -29 26 -1
i686-pc-linux-gnu 43 8 -10 27 -1
m32r-elf 68 -92 -31 14 -1
m68k-linux-gnu 169 -65 -23 33 -1
mcore-elf 27 -29 -14 8 -1
mmix 25 -75 -28 2 -1
mn10300-elf 166 32 -46 149 -1
moxie-rtems 937 1461 -1649 6000 -1
msp430-elf 89 -1364 -835 5 -4
nds32le-elf 34 -54 -29 2 -1
pdp11 252 -458 -23 13 -1
powerpc-ibm-aix7.0 3 -4 -2 -1 -1
powerpc64-linux-gnu 1 -1 -1 -1 -1
powerpc64le-linux-gnu 3 -3 -1 -1 -1
rl78-elf 4 -12 -4 -2 -4
rx-elf 59 -99 -11 2 -1
s390-linux-gnu 115 -117 -53 21 -1
s390x-linux-gnu 120 -47 -25 21 -1
sh-linux-gnu 54 -89 -31 8 -1
sparc64-linux-gnu 14 -6 -5 4 -1
tilepro-linux-gnu 209 -452 -55 16 -1
v850-elf 10 18 -2 21 -1
vax-netbsdelf 5 -5 -1 -1 -1
x86_64-darwin 53 -62 -33 3 -1
x86_64-linux-gnu 52 -8 -8 13 -1
xstormy16-elf 144 -814 -541 25 -1
xtensa-elf 578 -2096 -138 15 -1
The eye-watering moxie-rtems +6,000 outlier is from gcc.dg/pr59992.c,
where the same code template is instantiated 10,000 times. In some
instances the code improves by one instruction, but it regresses by
one instruction in many more.
To emphasise that this is a poor metric (and is just to get a flavour),
most of the {i686,x86_64}-linux-gnu LOC increases happen in frame info
rather than code. I should try to script that out...
Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
Yes.
It is a very non-trivial patch. It took a some time for me to analyze
your patch. But this time was spent for good :)
Thank you, Richard!
2019-09-30 Richard Sandiford <richard.sandif...@arm.com>
gcc/
* ira-lives.c (check_and_make_def_conflict): Handle cases in which
DEF is not a true earlyclobber but is tied to a specific input
operand, and so is effectively earlyclobber wrt inputs that have
different values.
(make_early_clobber_and_input_conflicts): Pass this case to the above.