https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 12 Feb 2018, vries at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83126
> 
> Tom de Vries <vries at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |rguenth at gcc dot gnu.org
> 
> --- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
> I.
> 
> If we disable transform_to_exit_first_loop_alt, we run into this assert 
> instead
> in transform_to_exit_first_loop:
> ...
>               gcc_assert (control_name == NULL_TREE
>                           && SSA_NAME_VAR (res) == SSA_NAME_VAR (control));
> ...
> 
> So the problem is not specific to transform_to_exit_first_loop_alt.
> 
> 
> II.
> 
> This is the loop we're trying to parallelize, at dce5 (so before parloops):
> ...
>   <bb 10> [local count: 118111600]:
> 
>   <bb 5> [local count: 236258638]:
>   # _22 = PHI <0(10), _5(8)>
>   # c9_23 = PHI <c9_1(10), c9_14(8)>
>   # e1_27 = PHI <0(10), e1_17(8)>
>   # ivtmp_26 = PHI <2(10), ivtmp_24(8)>
>   _4 = (short unsigned int) e1_27;
>   c9_14 = _4 * c9_23;
>   _5 = _22 + 1;
>   e1_17 = (int) _5;
>   ivtmp_24 = ivtmp_26 - 1;
>   if (ivtmp_24 != 0)
>     goto <bb 8>; [66.67%]
>   else
>     goto <bb 7>; [33.33%]
> 
>   <bb 8> [local count: 157513634]:
>   goto <bb 5>; [100.00%]
> 
>   <bb 7> [local count: 78745004]:
>   # c9_21 = PHI <c9_14(5)>
> ...
> 
> The loop has one exit phi, and 4 loop header phis. The phis are classified in
> try_create_reduction_list.
> 
> The loop exit phi c9_21 together with loop header phi c9_23 is classified as a
> reduction. The remaining 3 loop header phis are classified as 'simple_iv'.
> 
> The idea is that the 3 ivs will be unified by canonicalize_loop_ivs before
> doing transform_to_exit_first_loop_alt or transform_to_exit_first_loop.
> 
> But, after canonicalize_loop_ivs we still have 3 loop header phis (instead of
> the two we are expecting: one for the reduction and one for the iv):
> ...
>   <bb 5> [local count: 189006911]:
>   # c9_23 = PHI <c9_14(8), c9_1(15)>
>   # e1_27 = PHI <e1_17(8), 0(15)>
>   # ivtmp_8 = PHI <ivtmp_28(8), 0(15)>
>   _22 = ivtmp_8;
>   ivtmp_26 = 2 - ivtmp_8;
>   _4 = (short unsigned int) e1_27;
>   c9_14 = _4 * c9_23;
>   _5 = _22 + 1;
>   e1_17 = (int) _5;
>   ivtmp_24 = ivtmp_26 - 1;
>   if (ivtmp_8 < 1)
>     goto <bb 8>; [66.67%]
>   else
>     goto <bb 7>; [33.33%]
> ...
> The reason for this is that the e1_27 phi is not classified as simple_iv by
> canonicalize_loop_ivs.
> 
> This again can be tracked down to the loop versioning transformation at the
> beginning of gen_parallel_loop. Before the transformation, we have 3 simple_iv
> header phis, after the transformation we have 2. This is caused by scalar
> evolution failing for e1_27.
> 
> 
> III. 
> 
> The loop versioning transformation creates a condition, and a conditional jump
> to either the original loop (which is to be transformed into a parallel loop),
> or a copy of the loop. As it happens the condition for this example is
> hardcoded to 0 != 0, meaning we always jump to the copy. This makes the
> original loop dead code, and it's possible that the scalar evolution is 
> somehow
> influenced by that.
> 
> 
> IV.
> 
> FWIW, moving canonicalize_loop_ivs to before the loop versioning fixes the 
> ICE.
> But that needlessly transforms the copy of the loop, while this is only
> necessary for the original loop (which is to be transformed into a parallel
> loop).

This is the usual "you should not repeat analysis during transform" issue.
The vectorizer gets around this by caching relevant scalar evolution
but obviously that's difficult if using generic stuff like
canonicalize_loop_ivs ...

Reply via email to