https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120176
Bug ID: 120176 Summary: Missed reduction chain vectorization Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- We fail to handle the 2nd loop of the PR98138 testcase as reduction-chain: for( int i = 0; i < 4; i++ ) { HADAMARD4( a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i] ); sum += abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3); } because after reassoc we have inside of the loop sum.0_77 = (unsigned int) sum_152; _157 = _153 + sum.0_77; _88 = _157 + _132; _78 = _88 + _127; sum_108 = (int) _78; and on the exit _143 = sum_108 & 65535; _79 = (unsigned int) _143; sum.1_80 = (unsigned int) sum_108; which would be fine, but then forwprop applies folding of the int <-> unsigned int roundtrip and we end up with # sum_152 = PHI <sum_108(4), 0(3)> sum.0_77 = (unsigned int) sum_152; _157 = sum.0_77 + _153; _88 = _132 + _157; _78 = _88 + _127; sum_108 = (int) _78; _75 = _78 & 65535; _81 = _78 >> 16; which then fails reduction chain detection in the vectorizer as the reduction path ends with sum_108 but the live variable on the exit is _78 and so we fail the single-use check on _78: if ((op.code != code && !leading_conversion) /* We can only handle the final value in epilogue generation for reduction chains. */ || (i != 1 && !has_single_use (gimple_get_lhs (stmt)))) is_slp_reduc = false; the fix is to either avoid the folding with some extensive heuristics or relax the reduction chain detection, allowing either the trailing conversion or the last computation result to be live. Not handling this case as a reduction chain blocks us from using larger than V4SI vectors for the 2nd loop.