https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122723
Bug ID: 122723
Summary: Oddities around mask support with .COND_ADD reductions
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
double foo (double *a, char *mask, int n)
{
double sum = 0.0;
for (int i = 0; i < n; ++i)
{
double val;
if (mask[i])
val = a[i];
else
val = -0.0;
sum = sum + val;
}
return sum;
}
with -Ofast -march=znver4 we get
t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor 64
and no vector epilog. With -O3 -march=znver4 instead
t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor 64
t.c:4:21: optimized: epilogue loop vectorized using masked 64 byte vectors and
unroll factor 64
The former is due to
t.c:4:21: note: using single def-use cycle for reduction by reducing multiple
vectors to one in the loop body
vect_model_reduction_cost: inside_cost = 0, prologue_cost = 8, epilogue_cost =
32 .
t.c:4:21: missed: can't operate on partial vectors because no conditional
operation is available.
That is vect_reduction_update_partial_vector_usage at work which get's
.COND_ADD as 'code' and then things go downhill.