On Tue, 21 Oct 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: 21 October 2025 12:13
> > To: Tamar Christina <[email protected]>
> > Cc: [email protected]; nd <[email protected]>; [email protected]
> > Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for VLA
> > 
> > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <[email protected]>
> > > > Sent: 21 October 2025 10:27
> > > > To: Tamar Christina <[email protected]>
> > > > Cc: [email protected]; nd <[email protected]>; [email protected]
> > > > Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for
> > VLA
> > > >
> > > > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <[email protected]>
> > > > > > Sent: 21 October 2025 08:52
> > > > > > To: Tamar Christina <[email protected]>
> > > > > > Cc: [email protected]; nd <[email protected]>;
> > [email protected]
> > > > > > Subject: Re: [PATCH 1/4][vect]: Add support for boolean reductions 
> > > > > > for
> > > > VLA
> > > > > >
> > > > > > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > > > > >
> > > > > > > The support for the new boolean reduction optabs didn't quite work
> > for
> > > > VLA
> > > > > > > because the code later on insists on the target still having a 
> > > > > > > shift-and-
> > > > insert
> > > > > > > optab.
> > > > > > >
> > > > > > > This is however not needed if the target can do the reduction 
> > > > > > > using
> > the
> > > > new
> > > > > > > optabs.  This change makes it an OR not and AND requirement to 
> > > > > > > have
> > > > > > > shift-and-insert and the reduc sbool optabs.
> > > > > > >
> > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > > > > -m32, -m64 and no issues
> > > > > > >
> > > > > > > Any objections for master?
> > > > > >
> > > > > > I wonder if the comment isn't still true - iff we have a neutral
> > > > > > {-1,-1...} and the start value is 0, don't we still do what is
> > > > > > written there?  IIRC the testcases I added all have constant
> > > > > > start values, so the interesting case is with unknown start value.
> > > > >
> > > > > That's true, I can see in get_initial_defs_for_reduction for constant
> > > > > start values we just splat, which is what I expected.
> > > > >
> > > > > But in vect_transform_cycle_phi with only one initial value we also
> > > > > Just seem to splat and defer the checking of the initial value until
> > > > > after the reduction,
> > > > >
> > > > >         /* Try to simplify the vector initialization by applying an
> > > > >            adjustment after the reduction has been performed.  This
> > > > >            can also break a critical path but on the other hand
> > > > >            requires to keep the initial value live across the loop.  
> > > > > */
> > > > >         if (neutral_op
> > > > >             && initial_values.length () == 1
> > > > >             && !VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info)
> > > > >             && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> > > > >             && !operand_equal_p (neutral_op, initial_values[0]))
> > > > >
> > > > > So the interesting case is where we have an SLP tree with two unknown
> > > > > non-constant values. Though it's unclear to me if we can defer the 
> > > > > check
> > > > > for a single value, why not do it for all, surely the scalar final 
> > > > > reductions
> > > > > are still cheaper as scalar then element wise insertion in the 
> > > > > prologue?
> > > >
> > > > The idea here is of course to avoid keeping the initial value life
> > > > across the loop.  There's also
> > !VECT_REDUC_INFO_REUSED_ACCUMULATOR
> > > > here.
> > > >
> > >
> > > True, but they're of course scalar elements so you're less likely to have 
> > > much
> > > register pressure issues.  The issue is that the operations for
> > SHL_AND_INSERT
> > > can't be done for predicates.  At least not without round-tripping through
> > data.
> > 
> > Undestood.
> > 
> > > > > >
> > > > > > That said, when doing the bool reduction stuff my general impression
> > > > > > was that all the checks done could have some refactoring to make it
> > > > > > more obvious what check applies to what cases we have.  As you show
> > > > > > here, it's not always obvious.
> > > > > >
> > > > >
> > > > > So it looks like the check isn't needed for a single non-constant 
> > > > > value.
> > > > > I wonder.. is VECT_REDUC_INFO_INITIAL_VALUES set during
> > > > > Vectorizable_reduction.  From the code it looks like I need an 
> > > > > additional
> > > > > check on SLP_TREE_LANES (slp_node) != 1?
> > > >
> > > > Maybe simply avoiding the initial value insertion when we cannot
> > > > insert it ... I don't think we put this into costing in any way
> > > > (we also do not compute whether we re-use an accumulator for the
> > > > epilog, we just set things up in a way that it could be, but all
> > > > during transform ...).
> > >
> > > Yeah though this relaxation came 6 years after this code was added to 
> > > check
> > > ror shift and insert :)
> > >
> > > >
> > > > The initial_values.legnth () == 1 check is for SLP reductions where
> > > > we cannot perform the adjustment in the epilogue because we only
> > > > track the initial value of one reduction, not multiple ones as we'd
> > > > need for SLP reductions (and the "live" issue would become worse there).
> > > >
> > > > So, you'd need to check a SLP bool reduction (two bool accumulators).
> > >
> > > So I tried
> > >
> > > https://godbolt.org/z/zPEoaWPEv
> > >
> > > But it looks like for the single initial value case the code you added in
> > > a54aa75ab30eb1a176ceaded507113252df24878 for PR 115438 doesn't
> > > trigger.
> > 
> > Yeah, that code added back what was there for the non-SLP case (I
> > hoped this epilogue adjustment thing was not important...)
> > 
> > > So sadly SLP_TREE_LANES (slp_node) != 1 doesn't work...
> > 
> > It would be (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain) anyway,
> > but yes.
> > 
> > > So for now, I'm just testing
> > >
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index 6c242024971..0aa970ab338 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info
> > loop_vinfo,
> > >       values into the low-numbered elements.  */
> > >    if ((double_reduc || neutral_op)
> > >        && !nunits_out.is_constant ()
> > > +      && !really_constant_p (vect_phi_initial_value (reduc_def_phi))
> > >        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > >                                           vectype_out, 
> > > OPTIMIZE_FOR_SPEED))
> > >      {
> > >
> > > Instead which is the case we know works, and it looks like we'll have to 
> > > give
> > > boolean reduction handling some more TLC next year.
> > 
> > I wonder if we can simply set a "must-use-reduc_epilouge_adjustment"
> > flag if we get a not supported case here?  If there's any circumstance
> > we cannot use a reduc_epilogue_adjustment we'd have to bail of course.
> 
> We don't have an analysis phase for vect_transform_cycle_phi though since it's
> a transform only method.  It seems to be that vectorizable_reduction is more
> restrictive than the transform phases that it triggers 
> (vect_transform_cycle_phi,
> vect_transform_reduction).
> 
> So I'm not sure where else we'd reject the must-use-reduc_epilogue though. So 
> I
> think it likely has to be one way?

The analysis phase for vect_transform_cycle_phi and vect_transform_reduction
and vect_create_epilog_for_reduction (via vectorizable_live_operation) is
vectorizable_reduction.  So we indeed have to reject the non-workable case
there but we can use the flag to guide the heuristics in
vect_transform_cycle_phi (short of replicating exactly the same 
conditions).

What I was saying there is that we still have (many) ad-hoc decisions
made during transform which we should decide on during 
vectorizable_reduction and record as decision in a more simple form
(that ideally includes whether we can re-use an accumulator, though
that's a difficult thing as epilogue analysis has its own idea here).

Richard.

> > From what I can see that condition should indeed be
> > (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain).  The rest is
> > really "heuristics" (IIRC even the
> > VECT_REDUC_INFO_REUSED_ACCUMULATOR
> > guard).  Another simple check to add above is the
> > !operand_equal_p (neutral_op, initial_values[0]) check.
> > 
> 
> Hmm, fair, that can replace the constant check, let me try that instead.
> 
> Thanks,
> Tamar
> 
> 
> > Richard.
> > 
> > > Cheers,
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > >
> > > > > I'll write some examples and see.
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >   * tree-vect-loop.cc (vectorizable_reduction): Don't require
> > > > > > >   IFN_VEC_SHL_INSERT when using reduc sbool optabs.
> > > > > > >
> > > > > > > ---
> > > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > > > index
> > > > > >
> > > >
> > 6c2420249718237c4f70720b2bd03d4951bd8a5d..ca9ff35c0bb8d8c4a161
> > > > > > a922bae9c19028492b66 100644
> > > > > > > --- a/gcc/tree-vect-loop.cc
> > > > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info
> > > > > > loop_vinfo,
> > > > > > >       values into the low-numbered elements.  */
> > > > > > >    if ((double_reduc || neutral_op)
> > > > > > >        && !nunits_out.is_constant ()
> > > > > > > +      && (reduc_fn == IFN_LAST || !VECTOR_BOOLEAN_TYPE_P
> > > > > > (vectype_out))
> > > > > > >        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > > > > > >                                     vectype_out,
> > > > > > OPTIMIZE_FOR_SPEED))
> > > > > > >      {
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <[email protected]>
> > > > > > SUSE Software Solutions Germany GmbH,
> > > > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <[email protected]>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to