[Bug tree-optimization/116142] vec_widen_smult_{odd,even}_M ineffective for a simple widening dot product (vect_used_by_reduction is not set?)

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 31 Jul 2024 04:18:59 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116142


--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #4)
> (In reply to Richard Biener from comment #3)
> > (In reply to Xi Ruoyao from comment #2)
> > > (In reply to Richard Biener from comment #1)
> > > > To make it used by the reduction you'd need to have a dot_product 
> > > > covering
> > > > the accumulation as well.
> > > 
> > > I can add that, but what if we slightly alter it to something like
> > > 
> > > short x[8], y[8];
> > > 
> > > int dot() {
> > >   int ret = 0;
> > >   for (int i = 0; i < 8; i++)
> > >           ret ^= x[i] * y[i];
> > >   return ret;
> > > }
> > > 
> > > ?  It's no longer a dot product but shouldn't
> > > vec_widen_smult_{even,odd}_v8hi be used anyway?
> > 
> > Sure, you should see
> > 
> > t.c:5:20: note:   Analyze phi: ret_13 = PHI <ret_9(5), 0(2)>
> > t.c:5:20: note:   reduction path: ret_9 ret_13 
> > t.c:5:20: note:   reduction: detected reduction
> > t.c:5:20: note:   Detected reduction. 
> > ...
> > t.c:5:20: note:   vect_recog_widen_mult_pattern: detected: _5 = _2 * _4;
> > t.c:5:20: note:   widen_mult pattern recognized: patt_24 = _1 w* _3;
> > 
> > and then
> > 
> >   # vect_ret_13.11_12 = PHI <vect_ret_9.12_7(5), { 0, 0, 0, 0 }(2)>
> >   # ivtmp_29 = PHI <ivtmp_30(5), 0(2)>
> >   vect__1.6_20 = MEM <vector(8) short int> [(short int *)vectp_x.4_22];
> >   _1 = x[i_15];
> >   _2 = (int) _1;
> >   vect__3.9_17 = MEM <vector(8) short int> [(short int *)vectp_y.7_19];
> >   vect_patt_23.10_16 = WIDEN_MULT_LO_EXPR <vect__1.6_20, vect__3.9_17>;
> >   vect_patt_23.10_14 = WIDEN_MULT_HI_EXPR <vect__1.6_20, vect__3.9_17>;
> >   vect_ret_9.12_11 = vect_patt_23.10_16 ^ vect_ret_13.11_12;
> >   vect_ret_9.12_7 = vect_patt_23.10_14 ^ vect_ret_9.12_11;
> > 
> > at least that's what happens on x86.  It should also work with _EVEN/_ODD.
> 
> The condition of _EVEN/_ODD is more strict than _HI/_LO.  It requires
> STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction but this condition
> seems not true for my test cases.

Ah, that's because EVEN/ODD will de-interleave the vector and to make
use of those we'd need to re-interleave to build a hi/lo pair.  Consider

  ret[i] = x[i] * y[i];

it's safe for reductions in case the summation can be re-ordered.

I'll note the check in question is redundant since we have

          /* Elements in a vector with vect_used_by_reduction property cannot
             be reordered if the use chain with this property does not have the
             same operation.  One such an example is s += a * b, where elements
             in a and b cannot be reordered.  Here we check if the vector
defined
             by STMT is only directly used in the reduction statement.  */
          tree lhs = gimple_assign_lhs (stmt_info->stmt);
          stmt_vec_info use_stmt_info = loop_info->lookup_single_use (lhs);
          if (use_stmt_info
              && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
            return true;

but it also seems that vect_used_by_reduction is not set very optimistically.
It's definition

  /* defs that feed computations that end up (only) in a reduction. These
     defs may be used by non-reduction stmts, but eventually, any
     computations/values that are affected by these defs are used to compute
     a reduction (i.e. don't get stored to memory, for example). We use this
     to identify computations that we can change the order in which they are
     computed.  */
  vect_used_by_reduction,

is also not quite correct since that would mean

  a = b[i];
  d = x w* y;
  c = a * d;
  res += c;

would be OK but clearly it would mix even/odd permuted lanes in 'd' with
unpermuted lanes of 'a', computing a wrong value.  So the
STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction check isn't
enough to guarantee correctness here.  I think the latter test is on
it's own though.

I think we want to remove
vect_used_in_outer_by_reduction/vect_used_by_reduction
and instead propagate a flag that lanes might be arbitrarily permutated.

Can you check if the following makes things work for you?

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 67f6e5df255..7496e31164c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14200,7 +14200,6 @@ supportable_widening_operation (vec_info *vinfo,
         are properly set up for the caller.  If we fail, we'll continue with
         a VEC_WIDEN_MULT_LO/HI_EXPR check.  */
       if (vect_loop
-         && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
          && !nested_in_vect_loop_p (vect_loop, stmt_info)
          && supportable_widening_operation (vinfo, VEC_WIDEN_MULT_EVEN_EXPR,
                                             stmt_info, vectype_out,

[Bug tree-optimization/116142] vec_widen_smult_{odd,even}_M ineffective for a simple widening dot product (vect_used_by_reduction is not set?)

Reply via email to