On Wed, 12 Nov 2025, Richard Biener wrote:

> On Tue, 11 Nov 2025, Victor Do Nascimento wrote:
> 
> > Sorry about delayed reply,I've been obsessing on trying to solve one
> > last snag in the peeling for alignment implementation.
> > 
> > The reduction work was done trying to fix a regression I'd introduced in
> > the libstdc++-v3 unit tests during the course of the implementing of
> > these patches.
> > 
> > I reduced the unit test in question and I'm attaching that at the end of 
> > this
> > message.
> > 
> > Ironically, in writing up different testcases in plain C for
> > gcc.dg/vect, I've since discovered I fail to vectorize far simpler reduction
> > cases.
> > 
> > For the trivial case of
> > 
> >   sum = 0;
> >   while (1)
> >     {
> >       if (a[i] == 0) break;
> >       sum += a[i];
> >       i++;
> >     }
> > 
> > we get the following CFG:
> > 
> >   <bb 2> [local count: 118111600]:
> >   _17 = *a_8(D);
> >   if (_17 == 0)
> >     goto <bb 7>; [11.00%]
> >   else
> >     goto <bb 5>; [89.00%]
> > 
> >   <bb 5> [local count: 105119324]:
> > 
> >   <bb 3> [local count: 955630224]:
> >   # _18 =
> >   # sum_19 = PHI <sum_10(6), sum_7(D)(5)>
> >   # i_21 = PHI <i_11(6), 0(5)>
> >   sum_10 = _18 + sum_19;
> >   i_11 = i_21 + 1;
> >   _1 = (long unsigned int) i_11;
> >   _2 = _1 * 4;
> >   _3 = a_8(D) + _2;
> >   _4 = *_3;
> >   if (_4 == 0)
> >     goto <bb 8>; [11.00%]
> >   else
> >     goto <bb 6>; [89.00%]
> > 
> > and the vectorizer doesn't quite know how to handle the PHI <_4(6), _17(5)>,
> > categorized as `vect_unknown_def_type', so I'll figure out what to do about
> > that.
> 
> Generally for reductions we do not support an intermediate (aka _18
> here) result to be live because as reductions are re-associated we
> cannot compute this specific value.  Now, specifically _this_ value
> we _can_ compute, so the restriction is a bit over-eager.
> 
> But it's not sth to solve as part of this series.  Would be interesting
> to track in a bug though.
> 
> > Now for the promised testcase...
> > 
> > #include <numeric>
> > #include <iterator>
> > #include <cassert>
> > 
> > int a[]  = {4, 5, 6, 7, 8, 9, 10, 11};
> > double b[] = {0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5};
> > int N = 8;
> > 
> > template<typename _InputIterator1, typename _InputIterator2, typename _Tp>
> > _Tp
> > transform_reduce_a(_InputIterator1 a0, _InputIterator1 aN,
> >               _InputIterator2 b0, _Tp accum)
> > {
> >   while ((aN - a0) >= 4)
> 
> So we fail to analyze this as a counted loop?

Indeed.  But not sure why.  We have

  # a0_46 = PHI <a0_26(11), &a(2)>
...
  a0_26 = a0_46 + 16;
  _28 = &MEM <int[8]> [(void *)&a + 32B] - a0_26;
  if (_28 > 12)
    goto <bb 11>; [89.00%]

as IV.  I'll check.

It works on the 2nd loop but we fail vectorization because the reduction
has a float accumulator but adds double values.  This is currently
not a supported "conversion" around reductions, we only support
sign-conversions here.

So I'm not sure how this is a testcase for the fold-left reduction
as we do not even classify it as reduction?

With all double we vectorize the 2nd loop.

> >     {
> >       _Tp __v1 = (a0[0] * b0[0]) + (a0[1] * b0[1]);
> >       _Tp __v2 = (a0[2] * b0[2]) + (a0[3] * b0[3]);
> >       _Tp __v3 = (__v1 + __v2);
> >       accum = (accum + __v3);
> >       a0 += 4;
> >       b0 += 4;
> >     }
> >   for (; a0 != aN; ++a0, (void) ++b0)
> >     accum = (accum + (*a0 * *b0));
> >   return accum;
> > }
> > 
> > void
> > test01()
> > {
> >   auto res = transform_reduce_a(std::begin(a), std::end(a), std::begin(b),
> >                            std::move (1.0f));
> >   assert( res == (float)(1 + 30) );
> > }
> > int
> > main()
> > {
> >   test01();
> > }
> > 
> > 
> > Many thanks,
> > Victor
> > 
> > On 11/11/25 13:59, Richard Biener wrote:
> > > On Tue, 11 Nov 2025, Tamar Christina wrote:
> > > 
> > >>> -----Original Message-----
> > >>> From: Richard Biener <[email protected]>
> > >>> Sent: 11 November 2025 12:59
> > >>> To: Tamar Christina <[email protected]>
> > >>> Cc: Victor Do Nascimento <[email protected]>; gcc-
> > >>> [email protected]
> > >>> Subject: RE: [PATCH 08/13] vect: Reclassify early break fold left
> > >>> reductions as
> > >>> simple reductions
> > >>>
> > >>> On Tue, 11 Nov 2025, Tamar Christina wrote:
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: Richard Biener <[email protected]>
> > >>>>> Sent: 11 November 2025 12:16
> > >>>>> To: Victor Do Nascimento <[email protected]>
> > >>>>> Cc: [email protected]; Tamar Christina
> > >>> <[email protected]>;
> > >>>>> Victor Do Nascimento <[email protected]
> > >>>>> 1.compute.internal>
> > >>>>> Subject: Re: [PATCH 08/13] vect: Reclassify early break fold left
> > >>>>> reductions
> > >>> as
> > >>>>> simple reductions
> > >>>>>
> > >>>>> On Mon, 10 Nov 2025, Victor Do Nascimento wrote:
> > >>>>>
> > >>>>>> From: Victor Do Nascimento <[email protected]
> > >>>>> 1.compute.internal>
> > >>>>>>
> > >>>>>> This re-categorization of reductions for uncounted loops involving
> > >>>>>> reductions leads to the correct calling of
> > >>>>>> `vect_create_epilog_for_reduction' function.
> > >>>>>>
> > >>>>>> gcc/ChangeLog:
> > >>>>>>
> > >>>>>>  * tree-vect-loop.cc (vectorizable_reduction): Reclassify
> > >>>>>>  uncounted-loop VECT_REDUC_INFO_TYPE as
> > >>>>> TREE_CODE_REDUCTION.
> > >>>>>> ---
> > >>>>>>   gcc/tree-vect-loop.cc | 5 +++--
> > >>>>>>   1 file changed, 3 insertions(+), 2 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > >>>>>> index 901903cfbea..3b038169c95 100644
> > >>>>>> --- a/gcc/tree-vect-loop.cc
> > >>>>>> +++ b/gcc/tree-vect-loop.cc
> > >>>>>> @@ -7426,8 +7426,9 @@ vectorizable_reduction (loop_vec_info
> > >>>>> loop_vinfo,
> > >>>>>>                               "supported.\n");
> > >>>>>>          return false;
> > >>>>>>          }
> > >>>>>> -      VECT_REDUC_INFO_TYPE (reduc_info)
> > >>>>>> -        = reduction_type = FOLD_LEFT_REDUCTION;
> > >>>>>> +      VECT_REDUC_INFO_TYPE (reduc_info) = reduction_type
> > >>>>>> +        = LOOP_VINFO_NITERS_UNCOUNTED_P (loop_vinfo) ?
> > >>>>> TREE_CODE_REDUCTION
> > >>>>>> +        : FOLD_LEFT_REDUCTION;
> > >>>>>
> > >>>>> I don't think this is correct.  We've arrived here with a
> > >>>>> needs_fold_left_reduction_p check, if we cannot use a
> > >>>>> FOLD_LEFT_REDUCTION
> > >>>>> we have to fail.
> > >>>>>
> > >>>>> That said, instead of vect_create_epilog_for_reduction this goes
> > >>>>> through vectorize_fold_left_reduction which re-uses the original
> > >>>>> scalar reduction PHI and thus any specific early-break handling would
> > >>>>> need to go there.
> > >>>>>
> > >>>>> I believe that if this is an issue with respect to re-starting then
> > >>>>> that very same issue is present generally for early break 
> > >>>>> vectorization.
> > >>>>
> > >>>> Agree, I think vectorizable_reduction is missing support for reducing
> > >>>> from def 0.
> > >>>>
> > >>>> Note that we mostly normally fail to analyse the reduction so we never
> > >>>> get here hence the missing support, so I'm somewhat surprised uncounted
> > >>> loops did.
> > >>>>
> > >>>> Is there a testcase that shows this?
> > >>>
> > >>> Just add a FP reduction w/o -ffast-math to any existing early break
> > >>> testcase?  You can simply reduce x += 5. or so I think, so no loads
> > >>> necessary.
> > >>
> > >> You mean like this? https://godbolt.org/z/4jbKx7j5a
> > >>
> > >> At first glance that looks correct to me, the early exits use ret_12 and
> > >> the
> > >> main exit uses ret_6.
> > >>
> > >> So that's handled correctly.
> > > 
> > > Indeed.  So I wonder what goes wrong in the uncounted case - and
> > > possibly the peeled case with early break.
> > > 
> > > So I echo Tamar then, Victor, do you have a testcase that shows what
> > > goes wrong?
> > > 
> > > Thanks,
> > > Richard.
> > 
> > 
> > 
> 
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to