https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116142
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #4)
> (In reply to Richard Biener from comment #3)
> > (In reply to Xi Ruoyao from comment #2)
> > > (In reply to Richard Biener from comment #1)
> > > > To make it used by the reduction you'd need to have a dot_product
> > > > covering
> > > > the accumulation as well.
> > >
> > > I can add that, but what if we slightly alter it to something like
> > >
> > > short x[8], y[8];
> > >
> > > int dot() {
> > > int ret = 0;
> > > for (int i = 0; i < 8; i++)
> > > ret ^= x[i] * y[i];
> > > return ret;
> > > }
> > >
> > > ? It's no longer a dot product but shouldn't
> > > vec_widen_smult_{even,odd}_v8hi be used anyway?
> >
> > Sure, you should see
> >
> > t.c:5:20: note: Analyze phi: ret_13 = PHI <ret_9(5), 0(2)>
> > t.c:5:20: note: reduction path: ret_9 ret_13
> > t.c:5:20: note: reduction: detected reduction
> > t.c:5:20: note: Detected reduction.
> > ...
> > t.c:5:20: note: vect_recog_widen_mult_pattern: detected: _5 = _2 * _4;
> > t.c:5:20: note: widen_mult pattern recognized: patt_24 = _1 w* _3;
> >
> > and then
> >
> > # vect_ret_13.11_12 = PHI <vect_ret_9.12_7(5), { 0, 0, 0, 0 }(2)>
> > # ivtmp_29 = PHI <ivtmp_30(5), 0(2)>
> > vect__1.6_20 = MEM <vector(8) short int> [(short int *)vectp_x.4_22];
> > _1 = x[i_15];
> > _2 = (int) _1;
> > vect__3.9_17 = MEM <vector(8) short int> [(short int *)vectp_y.7_19];
> > vect_patt_23.10_16 = WIDEN_MULT_LO_EXPR <vect__1.6_20, vect__3.9_17>;
> > vect_patt_23.10_14 = WIDEN_MULT_HI_EXPR <vect__1.6_20, vect__3.9_17>;
> > vect_ret_9.12_11 = vect_patt_23.10_16 ^ vect_ret_13.11_12;
> > vect_ret_9.12_7 = vect_patt_23.10_14 ^ vect_ret_9.12_11;
> >
> > at least that's what happens on x86. It should also work with _EVEN/_ODD.
>
> The condition of _EVEN/_ODD is more strict than _HI/_LO. It requires
> STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction but this condition
> seems not true for my test cases.
Ah, that's because EVEN/ODD will de-interleave the vector and to make
use of those we'd need to re-interleave to build a hi/lo pair. Consider
ret[i] = x[i] * y[i];
it's safe for reductions in case the summation can be re-ordered.
I'll note the check in question is redundant since we have
/* Elements in a vector with vect_used_by_reduction property cannot
be reordered if the use chain with this property does not have the
same operation. One such an example is s += a * b, where elements
in a and b cannot be reordered. Here we check if the vector
defined
by STMT is only directly used in the reduction statement. */
tree lhs = gimple_assign_lhs (stmt_info->stmt);
stmt_vec_info use_stmt_info = loop_info->lookup_single_use (lhs);
if (use_stmt_info
&& STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
return true;
but it also seems that vect_used_by_reduction is not set very optimistically.
It's definition
/* defs that feed computations that end up (only) in a reduction. These
defs may be used by non-reduction stmts, but eventually, any
computations/values that are affected by these defs are used to compute
a reduction (i.e. don't get stored to memory, for example). We use this
to identify computations that we can change the order in which they are
computed. */
vect_used_by_reduction,
is also not quite correct since that would mean
a = b[i];
d = x w* y;
c = a * d;
res += c;
would be OK but clearly it would mix even/odd permuted lanes in 'd' with
unpermuted lanes of 'a', computing a wrong value. So the
STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction check isn't
enough to guarantee correctness here. I think the latter test is on
it's own though.
I think we want to remove
vect_used_in_outer_by_reduction/vect_used_by_reduction
and instead propagate a flag that lanes might be arbitrarily permutated.
Can you check if the following makes things work for you?
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 67f6e5df255..7496e31164c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14200,7 +14200,6 @@ supportable_widening_operation (vec_info *vinfo,
are properly set up for the caller. If we fail, we'll continue with
a VEC_WIDEN_MULT_LO/HI_EXPR check. */
if (vect_loop
- && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
&& !nested_in_vect_loop_p (vect_loop, stmt_info)
&& supportable_widening_operation (vinfo, VEC_WIDEN_MULT_EVEN_EXPR,
stmt_info, vectype_out,