On Tue, 21 Oct 2025, Tamar Christina wrote:
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: 21 October 2025 08:52
> > To: Tamar Christina <[email protected]>
> > Cc: [email protected]; nd <[email protected]>; [email protected]
> > Subject: Re: [PATCH 1/4][vect]: Add support for boolean reductions for VLA
> >
> > On Tue, 21 Oct 2025, Tamar Christina wrote:
> >
> > > The support for the new boolean reduction optabs didn't quite work for VLA
> > > because the code later on insists on the target still having a
> > > shift-and-insert
> > > optab.
> > >
> > > This is however not needed if the target can do the reduction using the
> > > new
> > > optabs. This change makes it an OR not and AND requirement to have
> > > shift-and-insert and the reduc sbool optabs.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues
> > >
> > > Any objections for master?
> >
> > I wonder if the comment isn't still true - iff we have a neutral
> > {-1,-1...} and the start value is 0, don't we still do what is
> > written there? IIRC the testcases I added all have constant
> > start values, so the interesting case is with unknown start value.
>
> That's true, I can see in get_initial_defs_for_reduction for constant
> start values we just splat, which is what I expected.
>
> But in vect_transform_cycle_phi with only one initial value we also
> Just seem to splat and defer the checking of the initial value until
> after the reduction,
>
> /* Try to simplify the vector initialization by applying an
> adjustment after the reduction has been performed. This
> can also break a critical path but on the other hand
> requires to keep the initial value live across the loop. */
> if (neutral_op
> && initial_values.length () == 1
> && !VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info)
> && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> && !operand_equal_p (neutral_op, initial_values[0]))
>
> So the interesting case is where we have an SLP tree with two unknown
> non-constant values. Though it's unclear to me if we can defer the check
> for a single value, why not do it for all, surely the scalar final reductions
> are still cheaper as scalar then element wise insertion in the prologue?
The idea here is of course to avoid keeping the initial value life
across the loop. There's also !VECT_REDUC_INFO_REUSED_ACCUMULATOR
here.
> >
> > That said, when doing the bool reduction stuff my general impression
> > was that all the checks done could have some refactoring to make it
> > more obvious what check applies to what cases we have. As you show
> > here, it's not always obvious.
> >
>
> So it looks like the check isn't needed for a single non-constant value.
> I wonder.. is VECT_REDUC_INFO_INITIAL_VALUES set during
> Vectorizable_reduction. From the code it looks like I need an additional
> check on SLP_TREE_LANES (slp_node) != 1?
Maybe simply avoiding the initial value insertion when we cannot
insert it ... I don't think we put this into costing in any way
(we also do not compute whether we re-use an accumulator for the
epilog, we just set things up in a way that it could be, but all
during transform ...).
The initial_values.legnth () == 1 check is for SLP reductions where
we cannot perform the adjustment in the epilogue because we only
track the initial value of one reduction, not multiple ones as we'd
need for SLP reductions (and the "live" issue would become worse there).
So, you'd need to check a SLP bool reduction (two bool accumulators).
Richard.
> I'll write some examples and see.
>
> Thanks,
> Tamar
>
> > Richard.
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-vect-loop.cc (vectorizable_reduction): Don't require
> > > IFN_VEC_SHL_INSERT when using reduc sbool optabs.
> > >
> > > ---
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> > 6c2420249718237c4f70720b2bd03d4951bd8a5d..ca9ff35c0bb8d8c4a161
> > a922bae9c19028492b66 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info
> > loop_vinfo,
> > > values into the low-numbered elements. */
> > > if ((double_reduc || neutral_op)
> > > && !nunits_out.is_constant ()
> > > + && (reduc_fn == IFN_LAST || !VECTOR_BOOLEAN_TYPE_P
> > (vectype_out))
> > > && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > > vectype_out,
> > OPTIMIZE_FOR_SPEED))
> > > {
> > >
> > >
> > >
> >
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)