https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119902
Bug ID: 119902
Summary: open-coded scatter/gather should not account
vec_to_scalar cost
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681555.html
in loop
> void foo (int n, int *off, double *a)
> {
> const int m = 32;
>
> for (int j = 0; j < n/m; ++j)
> {
> int const start = j*m;
> int const end = (j+1)*m;
>
> #pragma GCC ivdep
> for (int i = start; i < end; ++i)
> {
> a[off[i]] = a[i] < 0 ? a[i] : 0;
> }
> }
> }
we open code scatter instruction. Vectorizer costs it as vector load of off
followed by vec_to_scalar and 4 stores while the code results in 4 loads and 4
stores which is cheaper and would let us to vectorized more code.
This is also tested by gcc.target/i386/pr89618-2.c which will need xfail after
the aforementioned patch