On 13-04-16 6:56 PM, Michael Meissner wrote:
I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.

At one point LRA has decided to use the CTR to hold a CCmode value:

(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
         (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 
{*movcc_internal1}
      (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
         (nil)))

Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.

There are two ways to solve the problem.  One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't).  The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.

I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.

Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.

2013-04-16  Michael Meissner  <meiss...@linux.vnet.ibm.com>

        * config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
        control whether we only allow int modes to go in the CTR, LR,
        VRSAVE, VSCR registers.
        * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
        (rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
        constrained.
        (rs6000_option_override_internal): Set -mconstrain-regs if we are
        using the LRA register allocator.

        * lra.c (check_rtl): Use fatal_insn_not_found to report constraint
        does not match.

Mike, thanks for the patch and all the SPEC2006 data (which are very useful as I have no access to power machine which can be used for benchmarking). I guess that may be some benchmark scores are lower because of LRA lacks some micro-optimizations which reload implements through many power hooks (e.g. LRA does not use push reload). Although sometimes it is not a bad thing (e.g. LRA does not use SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for other useful things).

In general I got impression that power7 is the most difficult port for LRA. If we manage to port it, LRA ports for other targets will be easier.

I also reproduced bootstrap failure --with-cpu=power7 and I am going to work on this and after that on SPEC2006 you wrote about.

Reply via email to