https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120176
Bug ID: 120176
Summary: Missed reduction chain vectorization
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
We fail to handle the 2nd loop of the PR98138 testcase as reduction-chain:
for( int i = 0; i < 4; i++ )
{
HADAMARD4( a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]
);
sum += abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3);
}
because after reassoc we have inside of the loop
sum.0_77 = (unsigned int) sum_152;
_157 = _153 + sum.0_77;
_88 = _157 + _132;
_78 = _88 + _127;
sum_108 = (int) _78;
and on the exit
_143 = sum_108 & 65535;
_79 = (unsigned int) _143;
sum.1_80 = (unsigned int) sum_108;
which would be fine, but then forwprop applies folding of the int <-> unsigned
int roundtrip and we end up with
# sum_152 = PHI <sum_108(4), 0(3)>
sum.0_77 = (unsigned int) sum_152;
_157 = sum.0_77 + _153;
_88 = _132 + _157;
_78 = _88 + _127;
sum_108 = (int) _78;
_75 = _78 & 65535;
_81 = _78 >> 16;
which then fails reduction chain detection in the vectorizer as the
reduction path ends with sum_108 but the live variable on the exit
is _78 and so we fail the single-use check on _78:
if ((op.code != code && !leading_conversion)
/* We can only handle the final value in epilogue
generation for reduction chains. */
|| (i != 1 && !has_single_use (gimple_get_lhs (stmt))))
is_slp_reduc = false;
the fix is to either avoid the folding with some extensive heuristics
or relax the reduction chain detection, allowing either the trailing
conversion or the last computation result to be live.
Not handling this case as a reduction chain blocks us from using larger
than V4SI vectors for the 2nd loop.