On Tue, 21 Oct 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: 21 October 2025 10:27
> > To: Tamar Christina <[email protected]>
> > Cc: [email protected]; nd <[email protected]>; [email protected]
> > Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for VLA
> > 
> > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <[email protected]>
> > > > Sent: 21 October 2025 08:52
> > > > To: Tamar Christina <[email protected]>
> > > > Cc: [email protected]; nd <[email protected]>; [email protected]
> > > > Subject: Re: [PATCH 1/4][vect]: Add support for boolean reductions for
> > VLA
> > > >
> > > > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > > >
> > > > > The support for the new boolean reduction optabs didn't quite work for
> > VLA
> > > > > because the code later on insists on the target still having a 
> > > > > shift-and-
> > insert
> > > > > optab.
> > > > >
> > > > > This is however not needed if the target can do the reduction using 
> > > > > the
> > new
> > > > > optabs.  This change makes it an OR not and AND requirement to have
> > > > > shift-and-insert and the reduc sbool optabs.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > > -m32, -m64 and no issues
> > > > >
> > > > > Any objections for master?
> > > >
> > > > I wonder if the comment isn't still true - iff we have a neutral
> > > > {-1,-1...} and the start value is 0, don't we still do what is
> > > > written there?  IIRC the testcases I added all have constant
> > > > start values, so the interesting case is with unknown start value.
> > >
> > > That's true, I can see in get_initial_defs_for_reduction for constant
> > > start values we just splat, which is what I expected.
> > >
> > > But in vect_transform_cycle_phi with only one initial value we also
> > > Just seem to splat and defer the checking of the initial value until
> > > after the reduction,
> > >
> > >     /* Try to simplify the vector initialization by applying an
> > >        adjustment after the reduction has been performed.  This
> > >        can also break a critical path but on the other hand
> > >        requires to keep the initial value live across the loop.  */
> > >     if (neutral_op
> > >         && initial_values.length () == 1
> > >         && !VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info)
> > >         && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> > >         && !operand_equal_p (neutral_op, initial_values[0]))
> > >
> > > So the interesting case is where we have an SLP tree with two unknown
> > > non-constant values. Though it's unclear to me if we can defer the check
> > > for a single value, why not do it for all, surely the scalar final 
> > > reductions
> > > are still cheaper as scalar then element wise insertion in the prologue?
> > 
> > The idea here is of course to avoid keeping the initial value life
> > across the loop.  There's also !VECT_REDUC_INFO_REUSED_ACCUMULATOR
> > here.
> > 
> 
> True, but they're of course scalar elements so you're less likely to have much
> register pressure issues.  The issue is that the operations for SHL_AND_INSERT
> can't be done for predicates.  At least not without round-tripping through 
> data.

Undestood.

> > > >
> > > > That said, when doing the bool reduction stuff my general impression
> > > > was that all the checks done could have some refactoring to make it
> > > > more obvious what check applies to what cases we have.  As you show
> > > > here, it's not always obvious.
> > > >
> > >
> > > So it looks like the check isn't needed for a single non-constant value.
> > > I wonder.. is VECT_REDUC_INFO_INITIAL_VALUES set during
> > > Vectorizable_reduction.  From the code it looks like I need an additional
> > > check on SLP_TREE_LANES (slp_node) != 1?
> > 
> > Maybe simply avoiding the initial value insertion when we cannot
> > insert it ... I don't think we put this into costing in any way
> > (we also do not compute whether we re-use an accumulator for the
> > epilog, we just set things up in a way that it could be, but all
> > during transform ...).
> 
> Yeah though this relaxation came 6 years after this code was added to check
> ror shift and insert :)
> 
> > 
> > The initial_values.legnth () == 1 check is for SLP reductions where
> > we cannot perform the adjustment in the epilogue because we only
> > track the initial value of one reduction, not multiple ones as we'd
> > need for SLP reductions (and the "live" issue would become worse there).
> > 
> > So, you'd need to check a SLP bool reduction (two bool accumulators).
> 
> So I tried
> 
> https://godbolt.org/z/zPEoaWPEv
> 
> But it looks like for the single initial value case the code you added in
> a54aa75ab30eb1a176ceaded507113252df24878 for PR 115438 doesn't
> trigger.

Yeah, that code added back what was there for the non-SLP case (I
hoped this epilogue adjustment thing was not important...)

> So sadly SLP_TREE_LANES (slp_node) != 1 doesn't work...

It would be (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain) anyway,
but yes.

> So for now, I'm just testing
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 6c242024971..0aa970ab338 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>       values into the low-numbered elements.  */
>    if ((double_reduc || neutral_op)
>        && !nunits_out.is_constant ()
> +      && !really_constant_p (vect_phi_initial_value (reduc_def_phi))
>        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
>                                           vectype_out, OPTIMIZE_FOR_SPEED))
>      {
> 
> Instead which is the case we know works, and it looks like we'll have to give
> boolean reduction handling some more TLC next year.

I wonder if we can simply set a "must-use-reduc_epilouge_adjustment"
flag if we get a not supported case here?  If there's any circumstance
we cannot use a reduc_epilogue_adjustment we'd have to bail of course.
>From what I can see that condition should indeed be
(SLP_TREE_LANES (slp_node) != 1 && !reduc_chain).  The rest is
really "heuristics" (IIRC even the VECT_REDUC_INFO_REUSED_ACCUMULATOR
guard).  Another simple check to add above is the
!operand_equal_p (neutral_op, initial_values[0]) check.

Richard.

> Cheers,
> Tamar
> 
> > 
> > Richard.
> > 
> > 
> > > I'll write some examples and see.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >       * tree-vect-loop.cc (vectorizable_reduction): Don't require
> > > > >       IFN_VEC_SHL_INSERT when using reduc sbool optabs.
> > > > >
> > > > > ---
> > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > index
> > > >
> > 6c2420249718237c4f70720b2bd03d4951bd8a5d..ca9ff35c0bb8d8c4a161
> > > > a922bae9c19028492b66 100644
> > > > > --- a/gcc/tree-vect-loop.cc
> > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info
> > > > loop_vinfo,
> > > > >       values into the low-numbered elements.  */
> > > > >    if ((double_reduc || neutral_op)
> > > > >        && !nunits_out.is_constant ()
> > > > > +      && (reduc_fn == IFN_LAST || !VECTOR_BOOLEAN_TYPE_P
> > > > (vectype_out))
> > > > >        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > > > >                                         vectype_out,
> > > > OPTIMIZE_FOR_SPEED))
> > > > >      {
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <[email protected]>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to