On Tue, 21 Oct 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <[email protected]> > > Sent: 21 October 2025 10:27 > > To: Tamar Christina <[email protected]> > > Cc: [email protected]; nd <[email protected]>; [email protected] > > Subject: RE: [PATCH 1/4][vect]: Add support for boolean reductions for VLA > > > > On Tue, 21 Oct 2025, Tamar Christina wrote: > > > > > > -----Original Message----- > > > > From: Richard Biener <[email protected]> > > > > Sent: 21 October 2025 08:52 > > > > To: Tamar Christina <[email protected]> > > > > Cc: [email protected]; nd <[email protected]>; [email protected] > > > > Subject: Re: [PATCH 1/4][vect]: Add support for boolean reductions for > > VLA > > > > > > > > On Tue, 21 Oct 2025, Tamar Christina wrote: > > > > > > > > > The support for the new boolean reduction optabs didn't quite work for > > VLA > > > > > because the code later on insists on the target still having a > > > > > shift-and- > > insert > > > > > optab. > > > > > > > > > > This is however not needed if the target can do the reduction using > > > > > the > > new > > > > > optabs. This change makes it an OR not and AND requirement to have > > > > > shift-and-insert and the reduc sbool optabs. > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > > -m32, -m64 and no issues > > > > > > > > > > Any objections for master? > > > > > > > > I wonder if the comment isn't still true - iff we have a neutral > > > > {-1,-1...} and the start value is 0, don't we still do what is > > > > written there? IIRC the testcases I added all have constant > > > > start values, so the interesting case is with unknown start value. > > > > > > That's true, I can see in get_initial_defs_for_reduction for constant > > > start values we just splat, which is what I expected. > > > > > > But in vect_transform_cycle_phi with only one initial value we also > > > Just seem to splat and defer the checking of the initial value until > > > after the reduction, > > > > > > /* Try to simplify the vector initialization by applying an > > > adjustment after the reduction has been performed. This > > > can also break a critical path but on the other hand > > > requires to keep the initial value live across the loop. */ > > > if (neutral_op > > > && initial_values.length () == 1 > > > && !VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info) > > > && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def > > > && !operand_equal_p (neutral_op, initial_values[0])) > > > > > > So the interesting case is where we have an SLP tree with two unknown > > > non-constant values. Though it's unclear to me if we can defer the check > > > for a single value, why not do it for all, surely the scalar final > > > reductions > > > are still cheaper as scalar then element wise insertion in the prologue? > > > > The idea here is of course to avoid keeping the initial value life > > across the loop. There's also !VECT_REDUC_INFO_REUSED_ACCUMULATOR > > here. > > > > True, but they're of course scalar elements so you're less likely to have much > register pressure issues. The issue is that the operations for SHL_AND_INSERT > can't be done for predicates. At least not without round-tripping through > data.
Undestood. > > > > > > > > That said, when doing the bool reduction stuff my general impression > > > > was that all the checks done could have some refactoring to make it > > > > more obvious what check applies to what cases we have. As you show > > > > here, it's not always obvious. > > > > > > > > > > So it looks like the check isn't needed for a single non-constant value. > > > I wonder.. is VECT_REDUC_INFO_INITIAL_VALUES set during > > > Vectorizable_reduction. From the code it looks like I need an additional > > > check on SLP_TREE_LANES (slp_node) != 1? > > > > Maybe simply avoiding the initial value insertion when we cannot > > insert it ... I don't think we put this into costing in any way > > (we also do not compute whether we re-use an accumulator for the > > epilog, we just set things up in a way that it could be, but all > > during transform ...). > > Yeah though this relaxation came 6 years after this code was added to check > ror shift and insert :) > > > > > The initial_values.legnth () == 1 check is for SLP reductions where > > we cannot perform the adjustment in the epilogue because we only > > track the initial value of one reduction, not multiple ones as we'd > > need for SLP reductions (and the "live" issue would become worse there). > > > > So, you'd need to check a SLP bool reduction (two bool accumulators). > > So I tried > > https://godbolt.org/z/zPEoaWPEv > > But it looks like for the single initial value case the code you added in > a54aa75ab30eb1a176ceaded507113252df24878 for PR 115438 doesn't > trigger. Yeah, that code added back what was there for the non-SLP case (I hoped this epilogue adjustment thing was not important...) > So sadly SLP_TREE_LANES (slp_node) != 1 doesn't work... It would be (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain) anyway, but yes. > So for now, I'm just testing > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 6c242024971..0aa970ab338 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > values into the low-numbered elements. */ > if ((double_reduc || neutral_op) > && !nunits_out.is_constant () > + && !really_constant_p (vect_phi_initial_value (reduc_def_phi)) > && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT, > vectype_out, OPTIMIZE_FOR_SPEED)) > { > > Instead which is the case we know works, and it looks like we'll have to give > boolean reduction handling some more TLC next year. I wonder if we can simply set a "must-use-reduc_epilouge_adjustment" flag if we get a not supported case here? If there's any circumstance we cannot use a reduc_epilogue_adjustment we'd have to bail of course. >From what I can see that condition should indeed be (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain). The rest is really "heuristics" (IIRC even the VECT_REDUC_INFO_REUSED_ACCUMULATOR guard). Another simple check to add above is the !operand_equal_p (neutral_op, initial_values[0]) check. Richard. > Cheers, > Tamar > > > > > Richard. > > > > > > > I'll write some examples and see. > > > > > > Thanks, > > > Tamar > > > > > > > Richard. > > > > > > > > > Thanks, > > > > > Tamar > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * tree-vect-loop.cc (vectorizable_reduction): Don't require > > > > > IFN_VEC_SHL_INSERT when using reduc sbool optabs. > > > > > > > > > > --- > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > > > index > > > > > > 6c2420249718237c4f70720b2bd03d4951bd8a5d..ca9ff35c0bb8d8c4a161 > > > > a922bae9c19028492b66 100644 > > > > > --- a/gcc/tree-vect-loop.cc > > > > > +++ b/gcc/tree-vect-loop.cc > > > > > @@ -7565,6 +7565,7 @@ vectorizable_reduction (loop_vec_info > > > > loop_vinfo, > > > > > values into the low-numbered elements. */ > > > > > if ((double_reduc || neutral_op) > > > > > && !nunits_out.is_constant () > > > > > + && (reduc_fn == IFN_LAST || !VECTOR_BOOLEAN_TYPE_P > > > > (vectype_out)) > > > > > && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT, > > > > > vectype_out, > > > > OPTIMIZE_FOR_SPEED)) > > > > > { > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Richard Biener <[email protected]> > > > > SUSE Software Solutions Germany GmbH, > > > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > Nuernberg) > > > > > > > -- > > Richard Biener <[email protected]> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > Nuernberg) > -- Richard Biener <[email protected]> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
