On Tue, Sep 15, 2015 at 2:11 PM, Robin Dapp <rd...@linux.vnet.ibm.com> wrote: > Hi, > > recently, I came across a problem that keeps a load instruction in a > loop although it is loop-invariant. > > A simple example is: > > #include <stdio.h> > > #define SZ 256 > int a[SZ], b[SZ], c[SZ]; > > int main() { > > int i; > for (i = 0; i < SZ; i++) { > a[i] = b[i] + c[i]; > } > > printf("%d\n", a[0]); > } > > The resulting loop code in s390 assembly is: > (built with -march=z196 -O3) > > .L2: > l %r3, 0(%r1, %r5) # load b[i] > a %r3, 0(%r1, %r4) # add c[i] > st %r3, 0(%r1, %r12) # store a[i] > larl %r3, a # <-- load a, why? > aghi %r1, 4 # incr i > brctg %r2, .L2 # branch > > followed by the printf call: > > lgf %r3, 0(%r3) # load a[0] > larl %r2, .LC1 # load "%d\n" > brasl %r14, printf # call printf > > The cse1 pass detects that "larl %r3, a" and "lgf %r3, 0(%r3)" > can use the same internal register and unifies them. > > loop2_invariant then realizes that "larl %r3, a" is loop-invariant and > moves it out of the loop leaving a register-register move in the loop > that should be removed in some subsequent pass. > However, cprop3 replaces the source register of this move by the > constant symbolref "a" again,
To me that sounds like the real issue. cprop shouldn't undo what invariant motion did (or the copy should be propagated out before cprop) > the moved "larl %r3, a" is removed by > cse2 directly afterwards and we are back to square one... > (I could also post the corresponding RTL if requested) > > The problem begins, imho, when cse1 extends the BB of the loop to the BB > after the loop and uses the same register for the load in the loop as > well as after the loop. This prevents PRE, cprop and loop2_invariant > from correctly moving the load outside of the loop. > > I wonder if this behavior is intended in general or if I'm missing > something? On i386, my example works because the load of "a" before > printf is handled differently. > > I wrote a patch that disables basic block extension out of loops in the > first CSE pass for s390 which fixes the problem for me. It uses a target > hook and should only affect s390 but is not yet thoroughly tested. > > Any opinions/remarks on this? > > Regards > Robin