RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

Richard Biener Thu, 16 Nov 2023 06:18:03 -0800

On Thu, 16 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: Thursday, November 16, 2023 1:36 PM
> > To: Tamar Christina <[email protected]>
> > Cc: [email protected]; nd <[email protected]>; [email protected]
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Thu, 16 Nov 2023, Tamar Christina wrote:
> > 
> > > > > > > > >
> > > > > > > > > Perhaps I'm missing something here?
> > > > > > > >
> > > > > > > > OK, so I refreshed my mind of what
> > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > does.
> > > > > > > >
> > > > > > > > I still do not understand the (complexity of the) patch.
> > > > > > > > Basically the function computes the new value of the IV
> > > > > > > > "from scratch" based on the number of scalar iterations of
> > > > > > > > the vector loop,
> > > > the 'niter'
> > > > > > > > argument.  I would have expected that for the early exits we
> > > > > > > > either pass in a different 'niter' or alternatively a 
> > > > > > > > 'niter_adjustment'.
> > > > > > >
> > > > > > > But for an early exit there's no static value for adjusted
> > > > > > > niter, since you don't know which iteration you exited from.
> > > > > > > Unlike the normal exit when you know if you get there you've
> > > > > > > done all possible
> > > > > > iterations.
> > > > > > >
> > > > > > > So you must compute the scalar iteration count on the exit itself.
> > > > > >
> > > > > > ?  You do not need the actual scalar iteration you exited (you
> > > > > > don't compute that either), you need the scalar iteration the
> > > > > > vector iteration started with when it exited prematurely and
> > > > > > that's readily
> > > > available?
> > > > >
> > > > > For a normal exit yes, not for an early exit no?
> > > > > niters_vector_mult_vf is only valid for the main exit.
> > > > >
> > > > > There's the unadjusted scalar count, which is what it's using to
> > > > > adjust it to the final count.  Unless I'm missing something?
> > > >
> > > > Ah, of course - niters_vector_mult_vf is for the countable exit.
> > > > For the early exits we can't precompute the scalar iteration value.
> > > > But that then means we should compute the appropriate "continuation"
> > > > as live value of the vectorized IVs even when they were not
> > > > originally used outside of the loop.  I don't see how we can express
> > > > this in terms of the scalar IVs in the (not yet) vectorized loop -
> > > > similar to the reduction case you are going to end up with the wrong 
> > > > values
> > here.
> > > >
> > > > That said, I've for a long time wanted to preserve the original
> > > > control IV also for the vector code (leaving any "optimization"
> > > > to IVOPTs there), that would enable us to compute the correct
> > > > "niters_vector_mult_vf" based on that IV.
> > > >
> > > > So given we cannot use the scalar IVs you have to handle all
> > > > inductions (besides the main exit control IV) in 
> > > > vectorizable_live_operation
> > I think.
> > > >
> > >
> > > That's what I currently do, that's why there was the
> > >         if (STMT_VINFO_LIVE_P (phi_info))
> > >           continue;
> > 
> > Yes, but that only works for the inductions marked so.  We'd need to mark 
> > the
> > others as well, but only for the early exits.
> > 
> > > although I don't understand why we use the scalar count,  I suppose
> > > the reasoning is that we don't really want to keep it around, and 
> > > referencing
> > it forces it to be kept?
> > 
> > Referencing it will cause the scalar compute to be retained, but since we 
> > do not
> > adjust the scalar compute during vectorization (but expect it to be dead) 
> > the
> > scalar compute will compute the wrong thing (as shown by the reduction
> > example - I suspect inductions will suffer from the same problem).
> > 
> > > At the moment it just does `init + (final - init) * vf` which is correct 
> > > no?
> > 
> > The issue is that 'final' is not computed correctly in the vectorized loop. 
> >  This
> > formula might work for affine evolutions of course.
> > 
> > Extracting the correct value from the vectorized induction would be the
> > preferred solution.
> 
> Ok, so I should be able to just mark IVs as live during process_use if there 
> are
> multiple exits right? Since it's just gonna be unused on the main exit since 
> we
> use niters?
> 
> Because since it's the PHI inside the loop that needs to be marked live I 
> can't
> just do it for a specific exits no?
> 
> If I create a copy of the PHI node during peeling for use in early exits and 
> mark
> it live it won't work no?


I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow
arrange vectorizable_live_operation to be called, possibly adding
a edge argument to that as well.

Maybe the thing to do for the moment is to reject vectorization with
early breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or
reduction besides the main counting IV one you can already
special-case?

Richard.

> Tamar
> > 
> > > Also you missed the question below about how to avoid the creation of
> > > the block, You ok with changing that?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > > Or for now disable early-break for inductions that are not the main
> > > > exit control IV (in vect_can_advance_ivs_p)?
> > > >
> > > > > > > >
> > > > > > > > It seems your change handles different kinds of inductions
> > differently.
> > > > > > > > Specifically
> > > > > > > >
> > > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > >       if (restart_loop && ivtemp)
> > > > > > > >         {
> > > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > >           ni = build_int_cst (type, vf);
> > > > > > > >           if (inversed_iv)
> > > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > >                               fold_convert (type, step_expr));
> > > > > > > >         }
> > > > > > > >
> > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - 
> > > > > > > > step'
> > > > > > > > as the new value.  That seems to be very odd special casing
> > > > > > > > for unknown reasons.  And while you adjust vec_step_op_add,
> > > > > > > > you don't adjust vect_peel_nonlinear_iv_init (maybe not
> > > > > > > > supported - better assert
> > > > > > here).
> > > > > > >
> > > > > > > The VF case is for a normal "non-inverted" loop, where if you
> > > > > > > take an early exit you know that you have to do at most VF 
> > > > > > > iterations.
> > > > > > > The VF
> > > > > > > - step is to account for the inverted loop control flow where
> > > > > > > you exit after adjusting the IV already by + step.
> > > > > >
> > > > > > But doesn't that assume the IV counts from niter to zero?  I
> > > > > > don't see this special case is actually necessary, no?
> > > > > >
> > > > >
> > > > > I needed it because otherwise the scalar loop iterates one
> > > > > iteration too little So I got a miscompile with the inverter loop
> > > > > stuff.  I'll look at it again perhaps It can be solved differently.
> > > > >
> > > > > > >
> > > > > > > Peeling doesn't matter here, since you know you were able to
> > > > > > > do a vector iteration so it's safe to do VF iterations.  So
> > > > > > > having peeled doesn't affect the remaining iters count.
> > > > > > >
> > > > > > > >
> > > > > > > > Also the vec_step_op_add case will keep the original scalar
> > > > > > > > IV live even when it is a vectorized induction.  The code
> > > > > > > > recomputing the value from scratch avoids this.
> > > > > > > >
> > > > > > > >       /* For non-main exit create an intermediat edge to get
> > > > > > > > any updated
> > > > iv
> > > > > > > >          calculations.  */
> > > > > > > >       if (needs_interm_block
> > > > > > > >           && !iv_block
> > > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > !gimple_seq_empty_p
> > > > > > > > (new_stmts)))
> > > > > > > >         {
> > > > > > > >           iv_block = split_edge (update_e);
> > > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > > >         }
> > > > > > > >
> > > > > > > > this is also odd, can we adjust the API instead?  I suppose
> > > > > > > > this is because your computation uses the original loop IV,
> > > > > > > > if you based the computation off the initial value only this
> > > > > > > > might not be
> > > > necessary?
> > > > > > >
> > > > > > > No, on the main exit the code updates the value in the loop
> > > > > > > header and puts the Calculation in the merge block.  This
> > > > > > > works because it only needs to consume PHI nodes in the merge
> > > > > > > block and things like niters are
> > > > > > adjusted in the guard block.
> > > > > > >
> > > > > > > For an early exit, we don't have a guard block, only the merge 
> > > > > > > block.
> > > > > > > We have to update the PHI nodes in that block,  but can't do
> > > > > > > so since you can't produce a value and consume it in a PHI
> > > > > > > node in the same
> > > > BB.
> > > > > > > So we need to create the block to put the values in for use in
> > > > > > > the merge block.  Because there's no "guard" block for early 
> > > > > > > exits.
> > > > > >
> > > > > > ?  then compute niters in that block as well.
> > > > >
> > > > > We can't since it'll not be reachable through the right edge.
> > > > > What we can do if you want is slightly change peeling, we currently 
> > > > > peel
> > as:
> > > > >
> > > > >   \        \             /
> > > > >   E1     E2        Normal exit
> > > > >     \       |          |
> > > > >        \    |          Guard
> > > > >           \ |          |
> > > > >          Merge block
> > > > >                   |
> > > > >              Pre Header
> > > > >
> > > > > If we instead peel as:
> > > > >
> > > > >
> > > > >   \        \             /
> > > > >   E1     E2        Normal exit
> > > > >     \       |          |
> > > > >        Exit join   Guard
> > > > >           \ |          |
> > > > >          Merge block
> > > > >                   |
> > > > >              Pre Header
> > > > >
> > > > > We can use the exit join block.  This would also mean
> > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all
> > > > > exits and only really needs to adjust the phi nodes Coming out of
> > > > > the exit join
> > > > and guard block.
> > > > >
> > > > > Does this work for you?
> > 
> > Yeah, I think that would work.  But I'd like to sort out the correctness 
> > details of
> > the IV update itself before sorting out this code placement detail.
> > 
> > Richard.
> > 
> > > > > Thanks,
> > > > > Tamar
> > > > > >
> > > > > > > The API can be adjusted by always creating the empty block
> > > > > > > either during
> > > > > > peeling.
> > > > > > > That would prevent us from having to do anything special here.
> > > > > > > Would that work better?  Or I can do it in the loop that
> > > > > > > iterates over the exits to before the call to
> > > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > > might be more consistent.
> > > > > > >
> > > > > > > >
> > > > > > > > That said, I wonder why we cannot simply pass in an adjusted
> > > > > > > > niter which would be niters_vector_mult_vf - vf and be done with
> > that?
> > > > > > > >
> > > > > > >
> > > > > > > We can ofcourse not have this and recompute it from niters
> > > > > > > itself, however this does affect the epilog code layout.
> > > > > > > Particularly knowing the static number if iterations left
> > > > > > > causes it to usually unroll the loop and share some of the
> > > > > > > computations.  i.e. the scalar code is often more
> > > > > > efficient.
> > > > > > >
> > > > > > > The computation would be niters_vector_mult_vf - iters_done *
> > > > > > > vf, since the value put Here is the remaining iteration count.
> > > > > > > It's static for early
> > > > > > exits.
> > > > > >
> > > > > > Well, it might be "static" in that it doesn't really matter what
> > > > > > you use for the epilog main IV initial value as long as you are
> > > > > > sure you're not going to take that exit as you are sure we're
> > > > > > going to take one of the early exits.  So yeah, the special code
> > > > > > is probably OK, but it needs a better comment and as said the
> > > > > > structure of
> > > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > > >
> > > > > > As said an important part for optimization is to not keep the
> > > > > > scalar IVs live in the vector loop.
> > > > > >
> > > > > > > But can do whatever you prefer here.  Let me know what you
> > > > > > > prefer for the
> > > > > > above.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Tamar
> > > > > > > > > >
> > > > > > > > > > > It has to do this since you have to perform the side
> > > > > > > > > > > effects for the non-matching elements still.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Tamar
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +           if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > > +             continue;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +           /* For early break the final loop IV is:
> > > > > > > > > > > > > +              init + (final - init) * vf which takes 
> > > > > > > > > > > > > into
> > > > > > > > > > > > > +account
> > > > peeling
> > > > > > > > > > > > > +              values and non-single steps.  The main 
> > > > > > > > > > > > > exit
> > > > can
> > > > > > > > > > > > > +use
> > > > > > > > niters
> > > > > > > > > > > > > +              since if you exit from the main exit 
> > > > > > > > > > > > > you've
> > > > done
> > > > > > > > > > > > > +all
> > > > > > > > vector
> > > > > > > > > > > > > +              iterations.  For an early exit we 
> > > > > > > > > > > > > don't know
> > > > when
> > > > > > > > > > > > > +we
> > > > > > > > exit
> > > > > > > > > > > > > +so
> > > > > > > > > > > > we
> > > > > > > > > > > > > +              must re-calculate this on the exit.  */
> > > > > > > > > > > > > +           tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > > +           off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > > +                              fold_convert (stype,
> > > > start_expr),
> > > > > > > > > > > > > +                              fold_convert (stype,
> > > > init_expr));
> > > > > > > > > > > > > +           /* Now adjust for VF to get the final 
> > > > > > > > > > > > > iteration value.
> > > > */
> > > > > > > > > > > > > +           off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > > +                              build_int_cst (stype, 
> > > > > > > > > > > > > vf));
> > > > > > > > > > > > > +         }
> > > > > > > > > > > > > +       else
> > > > > > > > > > > > > +         off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > > +                            fold_convert (stype, 
> > > > > > > > > > > > > niters),
> > > > step_expr);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >         if (POINTER_TYPE_P (type))
> > > > > > > > > > > > >           ni = fold_build_pointer_plus (init_expr, 
> > > > > > > > > > > > > off);
> > > > > > > > > > > > >         else
> > > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > > vect_update_ivs_after_vectorizer (loop_vec_info
> > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > >        /* Don't bother call 
> > > > > > > > > > > > > vect_peel_nonlinear_iv_init.  */
> > > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > > >       ni = init_expr;
> > > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > > +     continue;
> > > > > > > > > > > >
> > > > > > > > > > > > This looks all a bit complicated - why wouldn't we
> > > > > > > > > > > > simply always use the PHI result when 'restart_loop'?
> > > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > > all cases?
> > > > > > > > > > > >
> > > > > > > > > > > > >        else
> > > > > > > > > > > > >       ni = vect_peel_nonlinear_iv_init (&stmts, 
> > > > > > > > > > > > > init_expr,
> > > > > > > > > > > > >                                         niters, 
> > > > > > > > > > > > > step_expr,
> > @@ -
> > > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > >
> > > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > > >
> > > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +      /* For non-main exit create an intermediat
> > > > > > > > > > > > > + edge to get any
> > > > > > > > updated iv
> > > > > > > > > > > > > +      calculations.  */
> > > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > > +       && !iv_block
> > > > > > > > > > > > > +       && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > > +     {
> > > > > > > > > > > > > +       iv_block = split_edge (update_e);
> > > > > > > > > > > > > +       update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > > > > +       last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > > +     }
> > > > > > > > > > > > > +
> > > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > > >       {
> > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > > >        niters_vector_mult_vf steps.  */
> > > > > > > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p
> > > > (loop_vinfo));
> > > > > > > > > > > > >        update_e = skip_vector ? e : 
> > > > > > > > > > > > > loop_preheader_edge
> > (epilog);
> > > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > -                                     update_e);
> > > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > > +     update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > > +     = !vect_is_loop_exit_latch_pred
> > > > (LOOP_VINFO_IV_EXIT
> > > > > > > > (loop_vinfo),
> > > > > > > > > > > > > +                                      LOOP_VINFO_LOOP
> > > > > > > > (loop_vinfo));
> > > > > > > > > > > >
> > > > > > > > > > > > You are computing this here and in
> > > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > +                                     update_e,
> > > > inversed_iv);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > > +     {
> > > > > > > > > > > > > +       if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > > > > +         continue;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > > > > +vf,
> > > > > > > > > > > > > +
> > > > niters_vector_mult_vf,
> > > > > > > > > > > > > +                                         exit, true);
> > > > > > > > > > > >
> > > > > > > > > > > > ... why does the same not work here?  Wouldn't the
> > > > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > (loop_vinfo)->src) or similar?  That is, whether the
> > > > > > > > > > > > exit is at or after the main IV exit?  (consider
> > > > > > > > > > > > having
> > > > > > > > > > > > two)
> > > > > > > > > > > >
> > > > > > > > > > > > > +     }
> > > > > > > > > > > > >
> > > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > > >       {
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <[email protected]> SUSE Software
> > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > > 36809, AG
> > > > > > > > > > Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <[email protected]> SUSE Software Solutions
> > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > > AG
> > > > > > > > Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <[email protected]> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <[email protected]>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

Reply via email to