RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

Tamar Christina Thu, 16 Nov 2023 05:22:45 -0800

> > > > > >
> > > > > > Perhaps I'm missing something here?
> > > > >
> > > > > OK, so I refreshed my mind of what
> > > > > vect_update_ivs_after_vectorizer
> > > does.
> > > > >
> > > > > I still do not understand the (complexity of the) patch.
> > > > > Basically the function computes the new value of the IV "from
> > > > > scratch" based on the number of scalar iterations of the vector loop,
> the 'niter'
> > > > > argument.  I would have expected that for the early exits we
> > > > > either pass in a different 'niter' or alternatively a 
> > > > > 'niter_adjustment'.
> > > >
> > > > But for an early exit there's no static value for adjusted niter,
> > > > since you don't know which iteration you exited from. Unlike the
> > > > normal exit when you know if you get there you've done all
> > > > possible
> > > iterations.
> > > >
> > > > So you must compute the scalar iteration count on the exit itself.
> > >
> > > ?  You do not need the actual scalar iteration you exited (you don't
> > > compute that either), you need the scalar iteration the vector
> > > iteration started with when it exited prematurely and that's readily
> available?
> >
> > For a normal exit yes, not for an early exit no? niters_vector_mult_vf
> > is only valid for the main exit.
> >
> > There's the unadjusted scalar count, which is what it's using to
> > adjust it to the final count.  Unless I'm missing something?
> 
> Ah, of course - niters_vector_mult_vf is for the countable exit.  For the 
> early
> exits we can't precompute the scalar iteration value.  But that then means we
> should compute the appropriate "continuation" as live value of the vectorized
> IVs even when they were not originally used outside of the loop.  I don't see
> how we can express this in terms of the scalar IVs in the (not yet) vectorized
> loop - similar to the reduction case you are going to end up with the wrong
> values here.
> 
> That said, I've for a long time wanted to preserve the original control IV 
> also for
> the vector code (leaving any "optimization"
> to IVOPTs there), that would enable us to compute the correct
> "niters_vector_mult_vf" based on that IV.
> 
> So given we cannot use the scalar IVs you have to handle all inductions
> (besides the main exit control IV) in vectorizable_live_operation I think.
>


That's what I currently do, that's why there was the
              if (STMT_VINFO_LIVE_P (phi_info))
                continue;

although I don't understand why we use the scalar count,  I suppose the 
reasoning
is that we don't really want to keep it around, and referencing it forces it to 
be kept?

At the moment it just does `init + (final - init) * vf` which is correct no?

Also you missed the question below about how to avoid the creation of the block,
You ok with changing that?

Thanks,
Tamar

> Or for now disable early-break for inductions that are not the main exit 
> control
> IV (in vect_can_advance_ivs_p)?
> 
> > > > >
> > > > > It seems your change handles different kinds of inductions 
> > > > > differently.
> > > > > Specifically
> > > > >
> > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > >       if (restart_loop && ivtemp)
> > > > >         {
> > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > >           ni = build_int_cst (type, vf);
> > > > >           if (inversed_iv)
> > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > >                               fold_convert (type, step_expr));
> > > > >         }
> > > > >
> > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > as the new value.  That seems to be very odd special casing for
> > > > > unknown reasons.  And while you adjust vec_step_op_add, you
> > > > > don't adjust vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > better assert
> > > here).
> > > >
> > > > The VF case is for a normal "non-inverted" loop, where if you take
> > > > an early exit you know that you have to do at most VF iterations.
> > > > The VF
> > > > - step is to account for the inverted loop control flow where you
> > > > exit after adjusting the IV already by + step.
> > >
> > > But doesn't that assume the IV counts from niter to zero?  I don't
> > > see this special case is actually necessary, no?
> > >
> >
> > I needed it because otherwise the scalar loop iterates one iteration
> > too little So I got a miscompile with the inverter loop stuff.  I'll
> > look at it again perhaps It can be solved differently.
> >
> > > >
> > > > Peeling doesn't matter here, since you know you were able to do a
> > > > vector iteration so it's safe to do VF iterations.  So having
> > > > peeled doesn't affect the remaining iters count.
> > > >
> > > > >
> > > > > Also the vec_step_op_add case will keep the original scalar IV
> > > > > live even when it is a vectorized induction.  The code
> > > > > recomputing the value from scratch avoids this.
> > > > >
> > > > >       /* For non-main exit create an intermediat edge to get any 
> > > > > updated
> iv
> > > > >          calculations.  */
> > > > >       if (needs_interm_block
> > > > >           && !iv_block
> > > > >           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > (new_stmts)))
> > > > >         {
> > > > >           iv_block = split_edge (update_e);
> > > > >           update_e = single_succ_edge (update_e->dest);
> > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > >         }
> > > > >
> > > > > this is also odd, can we adjust the API instead?  I suppose this
> > > > > is because your computation uses the original loop IV, if you
> > > > > based the computation off the initial value only this might not be
> necessary?
> > > >
> > > > No, on the main exit the code updates the value in the loop header
> > > > and puts the Calculation in the merge block.  This works because
> > > > it only needs to consume PHI nodes in the merge block and things
> > > > like niters are
> > > adjusted in the guard block.
> > > >
> > > > For an early exit, we don't have a guard block, only the merge block.
> > > > We have to update the PHI nodes in that block,  but can't do so
> > > > since you can't produce a value and consume it in a PHI node in the same
> BB.
> > > > So we need to create the block to put the values in for use in the
> > > > merge block.  Because there's no "guard" block for early exits.
> > >
> > > ?  then compute niters in that block as well.
> >
> > We can't since it'll not be reachable through the right edge.  What we
> > can do if you want is slightly change peeling, we currently peel as:
> >
> >   \        \             /
> >   E1     E2        Normal exit
> >     \       |          |
> >        \    |          Guard
> >           \ |          |
> >          Merge block
> >                   |
> >              Pre Header
> >
> > If we instead peel as:
> >
> >
> >   \        \             /
> >   E1     E2        Normal exit
> >     \       |          |
> >        Exit join   Guard
> >           \ |          |
> >          Merge block
> >                   |
> >              Pre Header
> >
> > We can use the exit join block.  This would also mean
> > vect_update_ivs_after_vectorizer Doesn't need to iterate over all
> > exits and only really needs to adjust the phi nodes Coming out of the exit 
> > join
> and guard block.
> >
> > Does this work for you?
> >
> > Thanks,
> > Tamar
> > >
> > > > The API can be adjusted by always creating the empty block either
> > > > during
> > > peeling.
> > > > That would prevent us from having to do anything special here.
> > > > Would that work better?  Or I can do it in the loop that iterates
> > > > over the exits to before the call to
> > > > vect_update_ivs_after_vectorizer, which I think
> > > might be more consistent.
> > > >
> > > > >
> > > > > That said, I wonder why we cannot simply pass in an adjusted
> > > > > niter which would be niters_vector_mult_vf - vf and be done with that?
> > > > >
> > > >
> > > > We can ofcourse not have this and recompute it from niters itself,
> > > > however this does affect the epilog code layout. Particularly
> > > > knowing the static number if iterations left causes it to usually
> > > > unroll the loop and share some of the computations.  i.e. the
> > > > scalar code is often more
> > > efficient.
> > > >
> > > > The computation would be niters_vector_mult_vf - iters_done * vf,
> > > > since the value put Here is the remaining iteration count.  It's
> > > > static for early
> > > exits.
> > >
> > > Well, it might be "static" in that it doesn't really matter what you
> > > use for the epilog main IV initial value as long as you are sure
> > > you're not going to take that exit as you are sure we're going to
> > > take one of the early exits.  So yeah, the special code is probably
> > > OK, but it needs a better comment and as said the structure of
> vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > >
> > > As said an important part for optimization is to not keep the scalar
> > > IVs live in the vector loop.
> > >
> > > > But can do whatever you prefer here.  Let me know what you prefer
> > > > for the
> > > above.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > > >
> > > > > > > > It has to do this since you have to perform the side
> > > > > > > > effects for the non-matching elements still.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +         if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > +           continue;
> > > > > > > > > > +
> > > > > > > > > > +         /* For early break the final loop IV is:
> > > > > > > > > > +            init + (final - init) * vf which takes into 
> > > > > > > > > > account
> peeling
> > > > > > > > > > +            values and non-single steps.  The main exit
> can
> > > > > > > > > > +use
> > > > > niters
> > > > > > > > > > +            since if you exit from the main exit you've
> done
> > > > > > > > > > +all
> > > > > vector
> > > > > > > > > > +            iterations.  For an early exit we don't know
> when
> > > > > > > > > > +we
> > > > > exit
> > > > > > > > > > +so
> > > > > > > > > we
> > > > > > > > > > +            must re-calculate this on the exit.  */
> > > > > > > > > > +         tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > +         off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > +                            fold_convert (stype,
> start_expr),
> > > > > > > > > > +                            fold_convert (stype,
> init_expr));
> > > > > > > > > > +         /* Now adjust for VF to get the final iteration 
> > > > > > > > > > value.
> */
> > > > > > > > > > +         off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > +                            build_int_cst (stype, vf));
> > > > > > > > > > +       }
> > > > > > > > > > +     else
> > > > > > > > > > +       off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > +                          fold_convert (stype, niters),
> step_expr);
> > > > > > > > > > +
> > > > > > > > > >       if (POINTER_TYPE_P (type))
> > > > > > > > > >         ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > >       else
> > > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > > > > (loop_vec_info
> > > > > > > > > loop_vinfo,
> > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > >     ni = init_expr;
> > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > +   continue;
> > > > > > > > >
> > > > > > > > > This looks all a bit complicated - why wouldn't we
> > > > > > > > > simply always use the PHI result when 'restart_loop'?
> > > > > > > > > Isn't that the correct old start value in
> > > > > > > all cases?
> > > > > > > > >
> > > > > > > > > >        else
> > > > > > > > > >     ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > > >                                       niters, step_expr, @@ 
> > > > > > > > > > -
> > > 2245,9 +2295,20 @@
> > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > (loop_vec_info
> > > > > > > > > > loop_vinfo,
> > > > > > > > > >
> > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > >
> > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > >        ni_name = force_gimple_operand (ni, &new_stmts,
> > > > > > > > > > false, var);
> > > > > > > > > > +
> > > > > > > > > > +      /* For non-main exit create an intermediat edge
> > > > > > > > > > + to get any
> > > > > updated iv
> > > > > > > > > > +    calculations.  */
> > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > +     && !iv_block
> > > > > > > > > > +     && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > (new_stmts)))
> > > > > > > > > > +   {
> > > > > > > > > > +     iv_block = split_edge (update_e);
> > > > > > > > > > +     update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > +     last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > +   }
> > > > > > > > > > +
> > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > >     {
> > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > > > > loop_vinfo, tree
> > > > > > > > > niters, tree nitersm1,
> > > > > > > > > >      niters_vector_mult_vf steps.  */
> > > > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p
> (loop_vinfo));
> > > > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge 
> > > > > > > > > > (epilog);
> > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > niters_vector_mult_vf,
> > > > > > > > > > -                                   update_e);
> > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > +   update_e = single_succ_edge (e->dest);
> > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > +   = !vect_is_loop_exit_latch_pred
> (LOOP_VINFO_IV_EXIT
> > > > > (loop_vinfo),
> > > > > > > > > > +                                    LOOP_VINFO_LOOP
> > > > > (loop_vinfo));
> > > > > > > > >
> > > > > > > > > You are computing this here and in
> > > vect_update_ivs_after_vectorizer?
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > + vf,
> > > > > > > niters_vector_mult_vf,
> > > > > > > > > > +                                   update_e,
> inversed_iv);
> > > > > > > > > > +
> > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > +   {
> > > > > > > > > > +     if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > +       continue;
> > > > > > > > > > +
> > > > > > > > > > +     vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > > > > +
> niters_vector_mult_vf,
> > > > > > > > > > +                                       exit, true);
> > > > > > > > >
> > > > > > > > > ... why does the same not work here?  Wouldn't the
> > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > (loop_vinfo)->src) or similar?  That is, whether the
> > > > > > > > > exit is at or after the main IV exit?  (consider having
> > > > > > > > > two)
> > > > > > > > >
> > > > > > > > > > +   }
> > > > > > > > > >
> > > > > > > > > >        if (skip_epilog)
> > > > > > > > > >     {
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguent...@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > AG
> > > > > > > Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguent...@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

Reply via email to