Hi, Steven Thanks for investigating this. This presumably was the reason that Vlad changed the constraint modifier for that pattern in his patch for LRA. I don't think that using memory is an improvement, but Mike is the best person to comment.
Thanks, David On Sat, Nov 2, 2013 at 6:48 PM, Steven Bosscher <stevenb....@gmail.com> wrote: > Hello, > > Today's powerpc64-linux gcc has 2 extra failures with -mlra vs. reload > (i.e. svn unpatched). > > (I'm excluding guality failure differences here because there are too > many of them that seem to fail at random after minimal changes > anywhere in the compiler...). > > Test results are posted here: > reload: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00128.html > lra: http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00129.html > > The new failures and total score is as follows (+=lra, -=reload): > +FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6 > +FAIL: gcc.target/powerpc/pr58330.c scan-assembler-not stwbrx > > === gcc Summary === > > -# of expected passes 97887 > -# of unexpected failures 536 > +# of expected passes 97903 > +# of unexpected failures 538 > # of unexpected successes 38 > # of expected failures 244 > -# of unsupported tests 1910 > +# of unsupported tests 1892 > > > The failure of pr53199.c is because of different instruction selection > for bswap. Test case is reduced to just one function: > > /* { dg-options "-O2 -mcpu=power6 -mavoid-indexed-addresses" } */ > long long > reg_reverse (long long x) > { > return __builtin_bswap64 (x); > } > > Reload left vs. LRA right: > reg_reverse: reg_reverse: > srdi 8,3,32 | addi 8,1,-16 > rlwinm 7,3,8,0xffffffff | srdi 10,3,32 > rlwinm 9,8,8,0xffffffff | addi 9,8,4 > rlwimi 7,3,24,0,7 | stwbrx 3,0,8 > rlwimi 7,3,24,16,23 | stwbrx 10,0,9 > rlwimi 9,8,24,0,7 | ld 3,-16(1) > rlwimi 9,8,24,16,23 < > sldi 7,7,32 < > or 7,7,9 < > mr 3,7 < > blr blr > > This same difference is responsible for the failure of pr58330.c which > also uses __builtin_bswap64(). > > The difference in RTL for the test case is this (after reload vs. after LRA): > - 11: {%7:DI=bswap(%3:DI);clobber %8:DI;clobber %9:DI;clobber %10:DI;} > - 20: %3:DI=%7:DI > + 20: %8:DI=%1:DI-0x10 > + 21: %8:DI=%8:DI // stupid no-op move > + 11: {[%8:DI]=bswap(%3:DI);clobber %9:DI;clobber %10:DI;clobber scratch;} > + 19: %3:DI=[%1:DI-0x10] > > So LRA believes going through memory is better than using a register, > even though obviously there are plenty registers available. > > What LRA does: > Creating newreg=129 > Removing SCRATCH in insn #11 (nop 2) > Creating newreg=130 > Removing SCRATCH in insn #11 (nop 3) > Creating newreg=131 > Removing SCRATCH in insn #11 (nop 4) > // at this point the insn would be a bswapdi2_64bit: > // 11: {%3:DI=bswap(%3:DI);clobber r129;clobber r130;clobber r131;} > // cost calculation for the insn alternatives: > 0 Early clobber: reject++ > 1 Non-pseudo reload: reject+=2 > 1 Spill pseudo in memory: reject+=3 > 2 Scratch win: reject+=2 > 3 Scratch win: reject+=2 > 4 Scratch win: reject+=2 > alt=0,overall=18,losers=1,rld_nregs=0 > 0 Non-pseudo reload: reject+=2 > 0 Spill pseudo in memory: reject+=3 > 0 Non input pseudo reload: reject++ > 2 Scratch win: reject+=2 > 3 Scratch win: reject+=2 > alt=1,overall=16,losers=1,rld_nregs=0 > Staticly defined alt reject+=12 > 0 Early clobber: reject++ > 2 Scratch win: reject+=2 > 3 Scratch win: reject+=2 > 4 Scratch win: reject+=2 > 0 Conflict early clobber reload: reject-- > alt=2,overall=24,losers=1,rld_nregs=0 > Choosing alt 1 in insn 11: (0) Z (1) r (2) &b (3) &r (4) > X {*bswapdi2_64bit} > Change to class BASE_REGS for r129 > Change to class GENERAL_REGS for r130 > Creating newreg=132 from oldreg=3, assigning class NO_REGS to r132 > Change to class NO_REGS for r131 > 11: {r132:DI=bswap(%3:DI);clobber r129:DI;clobber r130:DI;clobber r131:DI;} > REG_UNUSED r131:DI > REG_UNUSED r130:DI > REG_UNUSED r129:DI > > LRA selects alternative 1 (Z,r,&b,&r,X) which seems to be the right > choice, from looking at the constraints. Reload selects alternative 2 > which is slightly^2 discouraged: (??&r,r,&r,&r,&r). > > Is this an improvement or a regression? If it's an improvement then > these two test cases should be adjusted :-) > > Ciao! > Steven