> > > > > > > > > > > > Perhaps I'm missing something here? > > > > > > > > > > OK, so I refreshed my mind of what > > > > > vect_update_ivs_after_vectorizer > > > does. > > > > > > > > > > I still do not understand the (complexity of the) patch. > > > > > Basically the function computes the new value of the IV "from > > > > > scratch" based on the number of scalar iterations of the vector loop, > the 'niter' > > > > > argument. I would have expected that for the early exits we > > > > > either pass in a different 'niter' or alternatively a > > > > > 'niter_adjustment'. > > > > > > > > But for an early exit there's no static value for adjusted niter, > > > > since you don't know which iteration you exited from. Unlike the > > > > normal exit when you know if you get there you've done all > > > > possible > > > iterations. > > > > > > > > So you must compute the scalar iteration count on the exit itself. > > > > > > ? You do not need the actual scalar iteration you exited (you don't > > > compute that either), you need the scalar iteration the vector > > > iteration started with when it exited prematurely and that's readily > available? > > > > For a normal exit yes, not for an early exit no? niters_vector_mult_vf > > is only valid for the main exit. > > > > There's the unadjusted scalar count, which is what it's using to > > adjust it to the final count. Unless I'm missing something? > > Ah, of course - niters_vector_mult_vf is for the countable exit. For the > early > exits we can't precompute the scalar iteration value. But that then means we > should compute the appropriate "continuation" as live value of the vectorized > IVs even when they were not originally used outside of the loop. I don't see > how we can express this in terms of the scalar IVs in the (not yet) vectorized > loop - similar to the reduction case you are going to end up with the wrong > values here. > > That said, I've for a long time wanted to preserve the original control IV > also for > the vector code (leaving any "optimization" > to IVOPTs there), that would enable us to compute the correct > "niters_vector_mult_vf" based on that IV. > > So given we cannot use the scalar IVs you have to handle all inductions > (besides the main exit control IV) in vectorizable_live_operation I think. >
That's what I currently do, that's why there was the if (STMT_VINFO_LIVE_P (phi_info)) continue; although I don't understand why we use the scalar count, I suppose the reasoning is that we don't really want to keep it around, and referencing it forces it to be kept? At the moment it just does `init + (final - init) * vf` which is correct no? Also you missed the question below about how to avoid the creation of the block, You ok with changing that? Thanks, Tamar > Or for now disable early-break for inductions that are not the main exit > control > IV (in vect_can_advance_ivs_p)? > > > > > > > > > > > It seems your change handles different kinds of inductions > > > > > differently. > > > > > Specifically > > > > > > > > > > bool ivtemp = gimple_cond_lhs (cond) == iv_var; > > > > > if (restart_loop && ivtemp) > > > > > { > > > > > type = TREE_TYPE (gimple_phi_result (phi)); > > > > > ni = build_int_cst (type, vf); > > > > > if (inversed_iv) > > > > > ni = fold_build2 (MINUS_EXPR, type, ni, > > > > > fold_convert (type, step_expr)); > > > > > } > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step' > > > > > as the new value. That seems to be very odd special casing for > > > > > unknown reasons. And while you adjust vec_step_op_add, you > > > > > don't adjust vect_peel_nonlinear_iv_init (maybe not supported - > > > > > better assert > > > here). > > > > > > > > The VF case is for a normal "non-inverted" loop, where if you take > > > > an early exit you know that you have to do at most VF iterations. > > > > The VF > > > > - step is to account for the inverted loop control flow where you > > > > exit after adjusting the IV already by + step. > > > > > > But doesn't that assume the IV counts from niter to zero? I don't > > > see this special case is actually necessary, no? > > > > > > > I needed it because otherwise the scalar loop iterates one iteration > > too little So I got a miscompile with the inverter loop stuff. I'll > > look at it again perhaps It can be solved differently. > > > > > > > > > > Peeling doesn't matter here, since you know you were able to do a > > > > vector iteration so it's safe to do VF iterations. So having > > > > peeled doesn't affect the remaining iters count. > > > > > > > > > > > > > > Also the vec_step_op_add case will keep the original scalar IV > > > > > live even when it is a vectorized induction. The code > > > > > recomputing the value from scratch avoids this. > > > > > > > > > > /* For non-main exit create an intermediat edge to get any > > > > > updated > iv > > > > > calculations. */ > > > > > if (needs_interm_block > > > > > && !iv_block > > > > > && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p > > > > > (new_stmts))) > > > > > { > > > > > iv_block = split_edge (update_e); > > > > > update_e = single_succ_edge (update_e->dest); > > > > > last_gsi = gsi_last_bb (iv_block); > > > > > } > > > > > > > > > > this is also odd, can we adjust the API instead? I suppose this > > > > > is because your computation uses the original loop IV, if you > > > > > based the computation off the initial value only this might not be > necessary? > > > > > > > > No, on the main exit the code updates the value in the loop header > > > > and puts the Calculation in the merge block. This works because > > > > it only needs to consume PHI nodes in the merge block and things > > > > like niters are > > > adjusted in the guard block. > > > > > > > > For an early exit, we don't have a guard block, only the merge block. > > > > We have to update the PHI nodes in that block, but can't do so > > > > since you can't produce a value and consume it in a PHI node in the same > BB. > > > > So we need to create the block to put the values in for use in the > > > > merge block. Because there's no "guard" block for early exits. > > > > > > ? then compute niters in that block as well. > > > > We can't since it'll not be reachable through the right edge. What we > > can do if you want is slightly change peeling, we currently peel as: > > > > \ \ / > > E1 E2 Normal exit > > \ | | > > \ | Guard > > \ | | > > Merge block > > | > > Pre Header > > > > If we instead peel as: > > > > > > \ \ / > > E1 E2 Normal exit > > \ | | > > Exit join Guard > > \ | | > > Merge block > > | > > Pre Header > > > > We can use the exit join block. This would also mean > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all > > exits and only really needs to adjust the phi nodes Coming out of the exit > > join > and guard block. > > > > Does this work for you? > > > > Thanks, > > Tamar > > > > > > > The API can be adjusted by always creating the empty block either > > > > during > > > peeling. > > > > That would prevent us from having to do anything special here. > > > > Would that work better? Or I can do it in the loop that iterates > > > > over the exits to before the call to > > > > vect_update_ivs_after_vectorizer, which I think > > > might be more consistent. > > > > > > > > > > > > > > That said, I wonder why we cannot simply pass in an adjusted > > > > > niter which would be niters_vector_mult_vf - vf and be done with that? > > > > > > > > > > > > > We can ofcourse not have this and recompute it from niters itself, > > > > however this does affect the epilog code layout. Particularly > > > > knowing the static number if iterations left causes it to usually > > > > unroll the loop and share some of the computations. i.e. the > > > > scalar code is often more > > > efficient. > > > > > > > > The computation would be niters_vector_mult_vf - iters_done * vf, > > > > since the value put Here is the remaining iteration count. It's > > > > static for early > > > exits. > > > > > > Well, it might be "static" in that it doesn't really matter what you > > > use for the epilog main IV initial value as long as you are sure > > > you're not going to take that exit as you are sure we're going to > > > take one of the early exits. So yeah, the special code is probably > > > OK, but it needs a better comment and as said the structure of > vect_update_ivs_after_vectorizer is a bit hard to follow now. > > > > > > As said an important part for optimization is to not keep the scalar > > > IVs live in the vector loop. > > > > > > > But can do whatever you prefer here. Let me know what you prefer > > > > for the > > > above. > > > > > > > > Thanks, > > > > Tamar > > > > > > > > > Thanks, > > > > > Richard. > > > > > > > > > > > > > > > > Regards, > > > > > > Tamar > > > > > > > > > > > > > > > It has to do this since you have to perform the side > > > > > > > > effects for the non-matching elements still. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > > + if (STMT_VINFO_LIVE_P (phi_info)) > > > > > > > > > > + continue; > > > > > > > > > > + > > > > > > > > > > + /* For early break the final loop IV is: > > > > > > > > > > + init + (final - init) * vf which takes into > > > > > > > > > > account > peeling > > > > > > > > > > + values and non-single steps. The main exit > can > > > > > > > > > > +use > > > > > niters > > > > > > > > > > + since if you exit from the main exit you've > done > > > > > > > > > > +all > > > > > vector > > > > > > > > > > + iterations. For an early exit we don't know > when > > > > > > > > > > +we > > > > > exit > > > > > > > > > > +so > > > > > > > > > we > > > > > > > > > > + must re-calculate this on the exit. */ > > > > > > > > > > + tree start_expr = gimple_phi_result (phi); > > > > > > > > > > + off = fold_build2 (MINUS_EXPR, stype, > > > > > > > > > > + fold_convert (stype, > start_expr), > > > > > > > > > > + fold_convert (stype, > init_expr)); > > > > > > > > > > + /* Now adjust for VF to get the final iteration > > > > > > > > > > value. > */ > > > > > > > > > > + off = fold_build2 (MULT_EXPR, stype, off, > > > > > > > > > > + build_int_cst (stype, vf)); > > > > > > > > > > + } > > > > > > > > > > + else > > > > > > > > > > + off = fold_build2 (MULT_EXPR, stype, > > > > > > > > > > + fold_convert (stype, niters), > step_expr); > > > > > > > > > > + > > > > > > > > > > if (POINTER_TYPE_P (type)) > > > > > > > > > > ni = fold_build_pointer_plus (init_expr, off); > > > > > > > > > > else > > > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer > > > > > > > > > > (loop_vec_info > > > > > > > > > loop_vinfo, > > > > > > > > > > /* Don't bother call vect_peel_nonlinear_iv_init. */ > > > > > > > > > > else if (induction_type == vect_step_op_neg) > > > > > > > > > > ni = init_expr; > > > > > > > > > > + else if (restart_loop) > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > > > This looks all a bit complicated - why wouldn't we > > > > > > > > > simply always use the PHI result when 'restart_loop'? > > > > > > > > > Isn't that the correct old start value in > > > > > > > all cases? > > > > > > > > > > > > > > > > > > > else > > > > > > > > > > ni = vect_peel_nonlinear_iv_init (&stmts, init_expr, > > > > > > > > > > niters, step_expr, @@ > > > > > > > > > > - > > > 2245,9 +2295,20 @@ > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > (loop_vec_info > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > > > > > > > var = create_tmp_var (type, "tmp"); > > > > > > > > > > > > > > > > > > > > - last_gsi = gsi_last_bb (exit_bb); > > > > > > > > > > gimple_seq new_stmts = NULL; > > > > > > > > > > ni_name = force_gimple_operand (ni, &new_stmts, > > > > > > > > > > false, var); > > > > > > > > > > + > > > > > > > > > > + /* For non-main exit create an intermediat edge > > > > > > > > > > + to get any > > > > > updated iv > > > > > > > > > > + calculations. */ > > > > > > > > > > + if (needs_interm_block > > > > > > > > > > + && !iv_block > > > > > > > > > > + && (!gimple_seq_empty_p (stmts) || > > > > > > > > > > +!gimple_seq_empty_p > > > > > > > > > (new_stmts))) > > > > > > > > > > + { > > > > > > > > > > + iv_block = split_edge (update_e); > > > > > > > > > > + update_e = single_succ_edge (update_e->dest); > > > > > > > > > > + last_gsi = gsi_last_bb (iv_block); > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > /* Exit_bb shouldn't be empty. */ > > > > > > > > > > if (!gsi_end_p (last_gsi)) > > > > > > > > > > { > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info > > > > > > > > > > loop_vinfo, tree > > > > > > > > > niters, tree nitersm1, > > > > > > > > > > niters_vector_mult_vf steps. */ > > > > > > > > > > gcc_checking_assert (vect_can_advance_ivs_p > (loop_vinfo)); > > > > > > > > > > update_e = skip_vector ? e : loop_preheader_edge > > > > > > > > > > (epilog); > > > > > > > > > > - vect_update_ivs_after_vectorizer (loop_vinfo, > > > > > niters_vector_mult_vf, > > > > > > > > > > - update_e); > > > > > > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > > > > > > > > > > + update_e = single_succ_edge (e->dest); > > > > > > > > > > + bool inversed_iv > > > > > > > > > > + = !vect_is_loop_exit_latch_pred > (LOOP_VINFO_IV_EXIT > > > > > (loop_vinfo), > > > > > > > > > > + LOOP_VINFO_LOOP > > > > > (loop_vinfo)); > > > > > > > > > > > > > > > > > > You are computing this here and in > > > vect_update_ivs_after_vectorizer? > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > + /* Update the main exit first. */ > > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, > > > > > > > > > > + vf, > > > > > > > niters_vector_mult_vf, > > > > > > > > > > + update_e, > inversed_iv); > > > > > > > > > > + > > > > > > > > > > + /* And then update the early exits. */ > > > > > > > > > > + for (auto exit : get_loop_exit_edges (loop)) > > > > > > > > > > + { > > > > > > > > > > + if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo)) > > > > > > > > > > + continue; > > > > > > > > > > + > > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, vf, > > > > > > > > > > + > niters_vector_mult_vf, > > > > > > > > > > + exit, true); > > > > > > > > > > > > > > > > > > ... why does the same not work here? Wouldn't the > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS, > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT > > > > > > > > > (loop_vinfo)->src) or similar? That is, whether the > > > > > > > > > exit is at or after the main IV exit? (consider having > > > > > > > > > two) > > > > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > if (skip_epilog) > > > > > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Richard Biener <rguent...@suse.de> SUSE Software Solutions > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, > > > > > > > AG > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > -- > > > > > Richard Biener <rguent...@suse.de> SUSE Software Solutions > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > > Nuernberg) > > > > > > > > > > -- > > > Richard Biener <rguent...@suse.de> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > Nuernberg) > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg)