On 12/18/2017 03:15 AM, Richard Biener wrote: > On Fri, Dec 15, 2017 at 6:00 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote: >>> I hate this patch. >>> >>> The fundamental problem we have is that there are times when we very >>> much want to thread jumps and yet there are other times when we do not. >>> >>> To date we have been able to largely select between those two by >>> looking >>> at the shape of the CFG and the jump thread to see how threading a >>> particular jump would impact the shape of the CFG (particularly loops >>> in >>> the CFG). >>> >>> In this BZ we have a loop like this: >>> >>> 2 >>> | >>> 3<-------+ >>> | | >>> 4<---+ | >>> / \ | | >>> 5 6 | | >>> \ / | | >>> 7 | | >>> / \ | | >>> 8 11-+ | >>> / \ | >>> 9 10-------+ >>> | >>> E >>> >>> We want to thread the path (6, 7) (7, 11). ie, there's a path through >>> that inner loop where we don't need to do the loop test. Here's an >>> example (from libgomp testsuite) >>> >>> (N is 1500) >>> >>> for (j = 0; j < N; j++) >>> { >>> if (j > 500) >>> { >>> x[i][j] = i + j + 3; >>> y[j] = i*j + 10; >>> } >>> else >>> x[i][j] = x[i][j]*3; >>> } >>> >>> >>> >>> Threading (in effect) puts the loop test into both arms of the >>> conditional, then removes it from the ELSE arm because the condition is >>> always "keep looping" in that arm. >>> >>> This plays poorly with loop parallelization -- I tried to follow what's >>> going on in there and just simply got lost. I got as far as seeing >>> that >>> the parallelization code thought there were loop carried dependencies. >>> At least that's how it looked to me. But I don't see any loop carried >>> dependencies in the code. >> >> Hmm. I'll double check if I remember on Monday. > > So I did and the ultimate reason GRAPHITE FAILs is because for > the loop with the threading applied we can no longer compute > number of iterations. That's of course bad for _all_ loop optimizers, > not just GRAPHITE (we just don't have any other test coverage). > The main issue here is that after the threading is applied the > block with the loop exit no longer dominates the loop latch. > This means the loop should really be split but it also means we should > definitely avoid performing such threading before loop optimizers run, > and not just if parallelizing loops. > > Can you adjust your patch this way? The condition you check seems > to more or less match the case where we thread through the exit > test _into_ the loop? The bad thing is really if in the resulting CFG > the exit test isn't dominating the latch (thus we have an exit test > that is not always executed): The problem is we'll end up regressing other vectorization cases. I looked at a variety of things WRT the shape of the CFG and ultimately found that there wasn't anything I could use to distinguish the two cases. I'll look again though.
Jeff