On Tue, Sep 15, 2015 at 2:11 PM, Robin Dapp <rd...@linux.vnet.ibm.com> wrote:
> Hi,
>
> recently, I came across a problem that keeps a load instruction in a
> loop although it is loop-invariant.
>
> A simple example is:
>
> #include <stdio.h>
>
> #define SZ 256
> int a[SZ], b[SZ], c[SZ];
>
> int main() {
>
>   int i;
>   for (i = 0; i < SZ; i++) {
>     a[i] = b[i] + c[i];
>   }
>
>   printf("%d\n", a[0]);
> }
>
> The resulting loop code in s390 assembly is:
> (built with -march=z196 -O3)
>
> .L2:
>   l     %r3, 0(%r1, %r5)  # load b[i]
>   a     %r3, 0(%r1, %r4)  # add c[i]
>   st    %r3, 0(%r1, %r12) # store a[i]
>   larl  %r3, a            # <-- load a, why?
>   aghi  %r1, 4            # incr i
>   brctg %r2, .L2          # branch
>
> followed by the printf call:
>
>   lgf   %r3, 0(%r3)       # load a[0]
>   larl  %r2, .LC1         # load "%d\n"
>   brasl %r14, printf      # call printf
>
> The cse1 pass detects that "larl %r3, a" and "lgf %r3, 0(%r3)"
> can use the same internal register and unifies them.
>
> loop2_invariant then realizes that "larl  %r3, a" is loop-invariant and
> moves it out of the loop leaving a register-register move in the loop
> that should be removed in some subsequent pass.
> However, cprop3 replaces the source register of this move by the
> constant symbolref "a" again,

To me that sounds like the real issue.  cprop shouldn't undo what
invariant motion did (or the copy should be propagated out before
cprop)

> the moved "larl  %r3, a" is removed by
> cse2 directly afterwards and we are back to square one...
> (I could also post the corresponding RTL if requested)
>
> The problem begins, imho, when cse1 extends the BB of the loop to the BB
> after the loop and uses the same register for the load in the loop as
> well as after the loop. This prevents PRE, cprop and loop2_invariant
> from correctly moving the load outside of the loop.
>
> I wonder if this behavior is intended in general or if I'm missing
> something? On i386, my example works because the load of "a" before
> printf is handled differently.
>
> I wrote a patch that disables basic block extension out of loops in the
> first CSE pass for s390 which fixes the problem for me. It uses a target
> hook and should only affect s390 but is not yet thoroughly tested.
>
> Any opinions/remarks on this?
>
> Regards
>  Robin

Reply via email to