On 13-04-16 6:56 PM, Michael Meissner wrote:
I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.
At one point LRA has decided to use the CTR to hold a CCmode value:
(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
(reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360
{*movcc_internal1}
(expr_list:REG_DEAD (reg:CC 66 ctr [4411])
(nil)))
Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.
There are two ways to solve the problem. One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't). The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.
I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.
Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.
2013-04-16 Michael Meissner <meiss...@linux.vnet.ibm.com>
* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
control whether we only allow int modes to go in the CTR, LR,
VRSAVE, VSCR registers.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
constrained.
(rs6000_option_override_internal): Set -mconstrain-regs if we are
using the LRA register allocator.
* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
does not match.
Mike, thanks for the patch and all the SPEC2006 data (which are very
useful as I have no access to power machine which can be used for
benchmarking). I guess that may be some benchmark scores are lower
because of LRA lacks some micro-optimizations which reload implements
through many power hooks (e.g. LRA does not use push reload). Although
sometimes it is not a bad thing (e.g. LRA does not use
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for
other useful things).
In general I got impression that power7 is the most difficult port for
LRA. If we manage to port it, LRA ports for other targets will be easier.
I also reproduced bootstrap failure --with-cpu=power7 and I am going to
work on this and after that on SPEC2006 you wrote about.