On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote: >I hate this patch. > >The fundamental problem we have is that there are times when we very >much want to thread jumps and yet there are other times when we do not. > >To date we have been able to largely select between those two by >looking >at the shape of the CFG and the jump thread to see how threading a >particular jump would impact the shape of the CFG (particularly loops >in >the CFG). > >In this BZ we have a loop like this: > > 2 > | > 3<-------+ > | | > 4<---+ | > / \ | | > 5 6 | | > \ / | | > 7 | | > / \ | | > 8 11-+ | > / \ | > 9 10-------+ > | > E > >We want to thread the path (6, 7) (7, 11). ie, there's a path through >that inner loop where we don't need to do the loop test. Here's an >example (from libgomp testsuite) > >(N is 1500) > > for (j = 0; j < N; j++) > { > if (j > 500) > { > x[i][j] = i + j + 3; > y[j] = i*j + 10; > } > else > x[i][j] = x[i][j]*3; > } > > > >Threading (in effect) puts the loop test into both arms of the >conditional, then removes it from the ELSE arm because the condition is >always "keep looping" in that arm. > >This plays poorly with loop parallelization -- I tried to follow what's >going on in there and just simply got lost. I got as far as seeing >that >the parallelization code thought there were loop carried dependencies. >At least that's how it looked to me. But I don't see any loop carried >dependencies in the code.
Hmm. I'll double check if I remember on Monday. > >You might naturally ask if threading this is actually wise. It seems >to >broadly fit into the cases where we throttle threading so as not to >muck >around too much with the loop structure. It's not terrible to detect a >CFG of this shape and avoid threading. > >BUT.... > > >You can have essentially the same shape CFG (not 100% the same, but the >same key characteristics), but where jump threading simplifies things >in >ways that are good for the SLP vectorizer (vect/bb-slp-16.c) or where >jump threading avoids spurious warnings (graphite/scop-4.c) > >Initially I thought I'd seen a key difference in the contents of the >latch block, but I think I was just up too late and mis-read the dump. > >So I don't see anything in the CFG shape or the contents of the blocks >that can be reasonably analyzed at jump threading time. Given we have >two opposite needs and no reasonable way I can find to select between >them, I'm resorting to a disgusting hack. Namely to avoid threading >through the latch to another point in the loop *when loop >parallelization is enabled*. > >Let me be clear. This is a hack. I don't like it, not one little bit. >But I don't see a way to resolve the regression without introducing >other regressions and the loop parallelization code is a total mystery >to me. > >Bootstrapped on x86_64 and regression tested with and without graphite. >Confirmed it fixes the graphite related regressions mentioned in the BZ >on x86_64. > >Committing to the trunk and hanging my head in shame. I'd not have worried much about auto parallekization or graphite here. It does look like a missed handling there. Richard. >Jeff