> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 21 October 2025 10:27
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; nd <[email protected]>; [email protected]
> Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for VLA
> 
> On Tue, 21 Oct 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: 21 October 2025 08:52
> > > To: Tamar Christina <[email protected]>
> > > Cc: [email protected]; nd <[email protected]>; [email protected]
> > > Subject: Re: [PATCH 1/4][vect]: Add support for boolean reductions for
> VLA
> > >
> > > On Tue, 21 Oct 2025, Tamar Christina wrote:
> > >
> > > > The support for the new boolean reduction optabs didn't quite work for
> VLA
> > > > because the code later on insists on the target still having a 
> > > > shift-and-
> insert
> > > > optab.
> > > >
> > > > This is however not needed if the target can do the reduction using the
> new
> > > > optabs.  This change makes it an OR not and AND requirement to have
> > > > shift-and-insert and the reduc sbool optabs.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > -m32, -m64 and no issues
> > > >
> > > > Any objections for master?
> > >
> > > I wonder if the comment isn't still true - iff we have a neutral
> > > {-1,-1...} and the start value is 0, don't we still do what is
> > > written there?  IIRC the testcases I added all have constant
> > > start values, so the interesting case is with unknown start value.
> >
> > That's true, I can see in get_initial_defs_for_reduction for constant
> > start values we just splat, which is what I expected.
> >
> > But in vect_transform_cycle_phi with only one initial value we also
> > Just seem to splat and defer the checking of the initial value until
> > after the reduction,
> >
> >       /* Try to simplify the vector initialization by applying an
> >          adjustment after the reduction has been performed.  This
> >          can also break a critical path but on the other hand
> >          requires to keep the initial value live across the loop.  */
> >       if (neutral_op
> >           && initial_values.length () == 1
> >           && !VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info)
> >           && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> >           && !operand_equal_p (neutral_op, initial_values[0]))
> >
> > So the interesting case is where we have an SLP tree with two unknown
> > non-constant values. Though it's unclear to me if we can defer the check
> > for a single value, why not do it for all, surely the scalar final 
> > reductions
> > are still cheaper as scalar then element wise insertion in the prologue?
> 
> The idea here is of course to avoid keeping the initial value life
> across the loop.  There's also !VECT_REDUC_INFO_REUSED_ACCUMULATOR
> here.
> 

True, but they're of course scalar elements so you're less likely to have much
register pressure issues.  The issue is that the operations for SHL_AND_INSERT
can't be done for predicates.  At least not without round-tripping through data.

> > >
> > > That said, when doing the bool reduction stuff my general impression
> > > was that all the checks done could have some refactoring to make it
> > > more obvious what check applies to what cases we have.  As you show
> > > here, it's not always obvious.
> > >
> >
> > So it looks like the check isn't needed for a single non-constant value.
> > I wonder.. is VECT_REDUC_INFO_INITIAL_VALUES set during
> > Vectorizable_reduction.  From the code it looks like I need an additional
> > check on SLP_TREE_LANES (slp_node) != 1?
> 
> Maybe simply avoiding the initial value insertion when we cannot
> insert it ... I don't think we put this into costing in any way
> (we also do not compute whether we re-use an accumulator for the
> epilog, we just set things up in a way that it could be, but all
> during transform ...).

Yeah though this relaxation came 6 years after this code was added to check
ror shift and insert :)

> 
> The initial_values.legnth () == 1 check is for SLP reductions where
> we cannot perform the adjustment in the epilogue because we only
> track the initial value of one reduction, not multiple ones as we'd
> need for SLP reductions (and the "live" issue would become worse there).
> 
> So, you'd need to check a SLP bool reduction (two bool accumulators).

So I tried

https://godbolt.org/z/zPEoaWPEv

But it looks like for the single initial value case the code you added in
a54aa75ab30eb1a176ceaded507113252df24878 for PR 115438 doesn't
trigger. So sadly SLP_TREE_LANES (slp_node) != 1 doesn't work...

So for now, I'm just testing

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6c242024971..0aa970ab338 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
      values into the low-numbered elements.  */
   if ((double_reduc || neutral_op)
       && !nunits_out.is_constant ()
+      && !really_constant_p (vect_phi_initial_value (reduc_def_phi))
       && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
                                          vectype_out, OPTIMIZE_FOR_SPEED))
     {

Instead which is the case we know works, and it looks like we'll have to give
boolean reduction handling some more TLC next year.

Cheers,
Tamar

> 
> Richard.
> 
> 
> > I'll write some examples and see.
> >
> > Thanks,
> > Tamar
> >
> > > Richard.
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * tree-vect-loop.cc (vectorizable_reduction): Don't require
> > > >         IFN_VEC_SHL_INSERT when using reduc sbool optabs.
> > > >
> > > > ---
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> 6c2420249718237c4f70720b2bd03d4951bd8a5d..ca9ff35c0bb8d8c4a161
> > > a922bae9c19028492b66 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info
> > > loop_vinfo,
> > > >       values into the low-numbered elements.  */
> > > >    if ((double_reduc || neutral_op)
> > > >        && !nunits_out.is_constant ()
> > > > +      && (reduc_fn == IFN_LAST || !VECTOR_BOOLEAN_TYPE_P
> > > (vectype_out))
> > > >        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > > >                                           vectype_out,
> > > OPTIMIZE_FOR_SPEED))
> > > >      {
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <[email protected]>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

Reply via email to