Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

Richard Biener Mon, 18 Dec 2017 11:55:16 -0800

On December 18, 2017 5:06:25 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote:
>On 12/18/2017 03:15 AM, Richard Biener wrote:
>> On Fri, Dec 15, 2017 at 6:00 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law <l...@redhat.com>
>wrote:
>>>> I hate this patch.
>>>>
>>>> The fundamental problem we have is that there are times when we
>very
>>>> much want to thread jumps and yet there are other times when we do
>not.
>>>>
>>>> To date we have been able to largely select between those two by
>>>> looking
>>>> at the shape of the CFG and the jump thread to see how threading a
>>>> particular jump would impact the shape of the CFG (particularly
>loops
>>>> in
>>>> the CFG).
>>>>
>>>> In this BZ we have a loop like this:
>>>>
>>>>        2
>>>>        |
>>>>        3<-------+
>>>>        |        |
>>>>        4<---+   |
>>>>       / \   |   |
>>>>      5   6  |   |
>>>>       \  /  |   |
>>>>         7   |   |
>>>>        / \  |   |
>>>>       8  11-+   |
>>>>      / \        |
>>>>     9  10-------+
>>>>     |
>>>>     E
>>>>
>>>> We want to thread the path (6, 7) (7, 11).  ie, there's a path
>through
>>>> that inner loop where we don't need to do the loop test.  Here's an
>>>> example (from libgomp testsuite)
>>>>
>>>> (N is 1500)
>>>>
>>>>     for (j = 0; j < N; j++)
>>>>        {
>>>>          if (j > 500)
>>>>            {
>>>>              x[i][j] = i + j + 3;
>>>>              y[j] = i*j + 10;
>>>>            }
>>>>          else
>>>>            x[i][j] = x[i][j]*3;
>>>>        }
>>>>
>>>>
>>>>
>>>> Threading (in effect) puts the loop test into both arms of the
>>>> conditional, then removes it from the ELSE arm because the
>condition is
>>>> always "keep looping" in that arm.
>>>>
>>>> This plays poorly with loop parallelization -- I tried to follow
>what's
>>>> going on in there and just simply got lost.  I got as far as seeing
>>>> that
>>>> the parallelization code thought there were loop carried
>dependencies.
>>>> At least that's how it looked to me.  But I don't see any loop
>carried
>>>> dependencies in the code.
>>>
>>> Hmm. I'll double check if I remember on Monday.
>> 
>> So I did and the ultimate reason GRAPHITE FAILs is because for
>> the loop with the threading applied we can no longer compute
>> number of iterations.  That's of course bad for _all_ loop
>optimizers,
>> not just GRAPHITE (we just don't have any other test coverage).
>> The main issue here is that after the threading is applied the
>> block with the loop exit no longer dominates the loop latch.
>> This means the loop should really be split but it also means we
>should
>> definitely avoid performing such threading before loop optimizers
>run,
>> and not just if parallelizing loops.
>> 
>> Can you adjust your patch this way?  The condition you check seems
>> to more or less match the case where we thread through the exit
>> test _into_ the loop?  The bad thing is really if in the resulting
>CFG
>> the exit test isn't dominating the latch (thus we have an exit test
>> that is not always executed):
>The problem is we'll end up regressing other vectorization cases.  I
>looked at a variety of things WRT the shape of the CFG and ultimately
>found that there wasn't anything I could use to distinguish the two
>cases.   I'll look again though.


I think a good sanity check would be to assert (or dump) whenever we change the 
return value of number_of_iterations_exit from before to after threading in 
case the block with the exit test is involved in the threading path. 

I guess we can't undo after the fact but it might help getting more testcases 
and thus an idea what to look for exactly. 

Maybe sth to train the very first AI inside GCC ;) 

Richard. 

>Jeff

Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

Reply via email to