https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119902
Bug ID: 119902 Summary: open-coded scatter/gather should not account vec_to_scalar cost Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681555.html in loop > void foo (int n, int *off, double *a) > { > const int m = 32; > > for (int j = 0; j < n/m; ++j) > { > int const start = j*m; > int const end = (j+1)*m; > > #pragma GCC ivdep > for (int i = start; i < end; ++i) > { > a[off[i]] = a[i] < 0 ? a[i] : 0; > } > } > } we open code scatter instruction. Vectorizer costs it as vector load of off followed by vec_to_scalar and 4 stores while the code results in 4 loads and 4 stores which is cheaper and would let us to vectorized more code. This is also tested by gcc.target/i386/pr89618-2.c which will need xfail after the aforementioned patch