https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119902

            Bug ID: 119902
           Summary: open-coded scatter/gather should not account
                    vec_to_scalar cost
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681555.html
in loop

> void foo (int n, int *off, double *a)
> {
>   const int m = 32;
>
>   for (int j = 0; j < n/m; ++j)
>     {
>       int const start = j*m;
>       int const end = (j+1)*m;
>
> #pragma GCC ivdep
>       for (int i = start; i < end; ++i)
>         {
>           a[off[i]] = a[i] < 0 ? a[i] : 0;
>         }
>     }
> }

we open code scatter instruction. Vectorizer costs it as vector load of off
followed by vec_to_scalar and 4 stores while the code results in 4 loads and 4
stores which is cheaper and would let us to vectorized more code.

This is also tested by gcc.target/i386/pr89618-2.c which will need xfail after
the aforementioned patch

Reply via email to