On Wed, 15 Nov 2023, Tamar Christina wrote:
> Patch updated to latest trunk:
>
> Hi All,
>
> This changes the PHI node updates to support early breaks.
> It has to support both the case where the loop's exit matches the normal loop
> exit and one where the early exit is "inverted", i.e. it's an early exit edge.
>
> In the latter case we must always restart the loop for VF iterations. For an
> early exit the reason is obvious, but there are cases where the "normal" exit
> is located before the early one. This exit then does a check on ivtmp
> resulting
> in us leaving the loop since it thinks we're done.
>
> In these case we may still have side-effects to perform so we also go to the
> scalar loop.
>
> For the "normal" exit niters has already been adjusted for peeling, for the
> early exits we must find out how many iterations we actually did. So we have
> to recalculate the new position for each exit.
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
> (vect_update_ivs_after_vectorizer): Support early break.
> (vect_do_peeling): Use it.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3d2654cf1c842baac58f5
> 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class
> loop *loop,
> loop handles exactly VF scalars per iteration. */
>
> static gcond *
> -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> exit_edge,
> class loop *loop, tree niters, tree step,
> tree final_iv, bool niters_maybe_zero,
> gimple_stmt_iterator loop_cond_gsi)
> @@ -1412,7 +1412,7 @@ vect_set_loop_condition (class loop *loop, edge loop_e,
> loop_vec_info loop_vinfo
> When this happens we need to flip the understanding of main and other
> exits by peeling and IV updates. */
>
> -bool inline
> +bool
> vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
> {
> return single_pred (loop->latch) == loop_exit->src;
> @@ -2142,6 +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> Input:
> - LOOP - a loop that is going to be vectorized. The last few iterations
> of LOOP were peeled.
> + - VF - The chosen vectorization factor for LOOP.
> - NITERS - the number of iterations that LOOP executes (before it is
> vectorized). i.e, the number of times the ivs should be
> bumped.
> - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
the comment on this is now a bit misleading, can you try to update it
and/or move the comment bits to the docs on EARLY_EXIT?
> @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> The phi args associated with the edge UPDATE_E in the bb
> UPDATE_E->dest are updated accordingly.
>
> + - restart_loop - Indicates whether the scalar loop needs to restart the
params are ALL_CAPS
> + iteration count where the vector loop began.
> +
> Assumption 1: Like the rest of the vectorizer, this function assumes
> a single loop exit that has a single predecessor.
>
> @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> */
>
> static void
> -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> - tree niters, edge update_e)
> +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,
LOOP_VINFO_VECT_FACTOR?
> + tree niters, edge update_e, bool restart_loop)
I think 'bool early_exit' is better here? I wonder if we have an "early"
exit after the main exit we are probably sure there are no side-effects
to re-execute and could avoid this restarting?
> {
> gphi_iterator gsi, gsi1;
> class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> basic_block update_bb = update_e->dest;
> -
> - basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -
> - /* Make sure there exists a single-predecessor exit bb: */
> - gcc_assert (single_pred_p (exit_bb));
> - gcc_assert (single_succ_edge (exit_bb) == update_e);
> + bool inversed_iv
> + = !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> + LOOP_VINFO_LOOP (loop_vinfo));
> + bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && flow_bb_inside_loop_p (loop, update_e->src);
> + edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> + gcond *cond = get_loop_exit_condition (loop_e);
> + basic_block exit_bb = loop_e->dest;
> + basic_block iv_block = NULL;
> + gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
>
> for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> (update_bb);
> !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> @@ -2190,7 +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> loop_vinfo,
> tree step_expr, off;
> tree type;
> tree var, ni, ni_name;
> - gimple_stmt_iterator last_gsi;
>
> gphi *phi = gsi.phi ();
> gphi *phi1 = gsi1.phi ();
> @@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> loop_vinfo,
> enum vect_induction_op_type induction_type
> = STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
>
> - if (induction_type == vect_step_op_add)
> + tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
> + /* create_iv always places it on the LHS. Alternatively we can set a
> + property during create_iv to identify it. */
> + bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> + if (restart_loop && ivtemp)
> {
> + type = TREE_TYPE (gimple_phi_result (phi));
> + ni = build_int_cst (type, vf);
> + if (inversed_iv)
> + ni = fold_build2 (MINUS_EXPR, type, ni,
> + fold_convert (type, step_expr));
> + }
> + else if (induction_type == vect_step_op_add)
> + {
> +
> tree stype = TREE_TYPE (step_expr);
> - off = fold_build2 (MULT_EXPR, stype,
> - fold_convert (stype, niters), step_expr);
> +
> + /* Early exits always use last iter value not niters. */
> + if (restart_loop)
> + {
> + /* Live statements in the non-main exit shouldn't be adjusted. We
> + normally didn't have this problem with a single exit as live
> + values would be in the exit block. However when dealing with
> + multiple exits all exits are redirected to the merge block
> + and we restart the iteration. */
Hmm, I fail to see how this works - we're either using the value to
continue the induction or not, independent of STMT_VINFO_LIVE_P.
> + if (STMT_VINFO_LIVE_P (phi_info))
> + continue;
> +
> + /* For early break the final loop IV is:
> + init + (final - init) * vf which takes into account peeling
> + values and non-single steps. The main exit can use niters
> + since if you exit from the main exit you've done all vector
> + iterations. For an early exit we don't know when we exit so we
> + must re-calculate this on the exit. */
> + tree start_expr = gimple_phi_result (phi);
> + off = fold_build2 (MINUS_EXPR, stype,
> + fold_convert (stype, start_expr),
> + fold_convert (stype, init_expr));
> + /* Now adjust for VF to get the final iteration value. */
> + off = fold_build2 (MULT_EXPR, stype, off,
> + build_int_cst (stype, vf));
> + }
> + else
> + off = fold_build2 (MULT_EXPR, stype,
> + fold_convert (stype, niters), step_expr);
> +
> if (POINTER_TYPE_P (type))
> ni = fold_build_pointer_plus (init_expr, off);
> else
> @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> loop_vinfo,
> /* Don't bother call vect_peel_nonlinear_iv_init. */
> else if (induction_type == vect_step_op_neg)
> ni = init_expr;
> + else if (restart_loop)
> + continue;
This looks all a bit complicated - why wouldn't we simply always
use the PHI result when 'restart_loop'? Isn't that the correct old start
value in all cases?
> else
> ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> niters, step_expr,
> @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> loop_vinfo,
>
> var = create_tmp_var (type, "tmp");
>
> - last_gsi = gsi_last_bb (exit_bb);
> gimple_seq new_stmts = NULL;
> ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> +
> + /* For non-main exit create an intermediat edge to get any updated iv
> + calculations. */
> + if (needs_interm_block
> + && !iv_block
> + && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p (new_stmts)))
> + {
> + iv_block = split_edge (update_e);
> + update_e = single_succ_edge (update_e->dest);
> + last_gsi = gsi_last_bb (iv_block);
> + }
> +
> /* Exit_bb shouldn't be empty. */
> if (!gsi_end_p (last_gsi))
> {
> @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> niters_vector_mult_vf steps. */
> gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> update_e = skip_vector ? e : loop_preheader_edge (epilog);
> - vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> - update_e);
> + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + update_e = single_succ_edge (e->dest);
> + bool inversed_iv
> + = !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> + LOOP_VINFO_LOOP (loop_vinfo));
You are computing this here and in vect_update_ivs_after_vectorizer?
> +
> + /* Update the main exit first. */
> + vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> niters_vector_mult_vf,
> + update_e, inversed_iv);
> +
> + /* And then update the early exits. */
> + for (auto exit : get_loop_exit_edges (loop))
> + {
> + if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> + continue;
> +
> + vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> + niters_vector_mult_vf,
> + exit, true);
... why does the same not work here? Wouldn't the proper condition
be !dominated_by_p (CDI_DOMINATORS, exit->src, LOOP_VINFO_IV_EXIT
(loop_vinfo)->src) or similar? That is, whether the exit is at or
after the main IV exit? (consider having two)
> + }
>
> if (skip_epilog)
> {
>