On Mon, Jan 20, 2025 at 8:57 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > OMP reductions are lowered into the form: > > idx = .OMP_SIMD_LANE (simuid, 0); > ... > oldval = D.anon[idx]; > newval = oldval op ...; > D.anon[idx] = newval; > > So if the scalar loop has a {0, +, 1} iv i, idx = i % vf. > Despite this wraparound, the vectoriser pretends that the D.anon > accesses are linear. It records the .OMP_SIMD_LANE's second argument > (val) in the data_reference aux field (-1 - val) and then copies this > to the stmt_vec_info simd_lane_access_p field (val + 1). > > vectorizable_load and vectorizable_store use simd_lane_access_p > to detect accesses of this form and suppress the vector pointer > increments that would be used for genuine linear accesses. > > The difference in this PR is that the reduction is conditional, > and so the store back to D.anon is recognised as a conditional > store pattern. simd_lane_access_p was not being copied across > from the original stmt_vec_info to the pattern stmt_vec_info, > meaning that it was vectorised as a normal linear store.
Thanks for fixing. > Tested on aarch64-linux-gnu & pushed as obvious (I hope). > I imagine this would eventually be represented differently in > the new all-SLP world, but I assumed that would be stage 1 material. Yeah - I stopped short of re-implementing all of this in the hope it will get simpler once I do not have to care to not break the non-SLP path ... (same is true for gather/scatter analysis) Richard. > Richard > > > gcc/ > PR tree-optimization/118384 > * tree-vectorizer.cc (vec_info::move_dr): Copy > STMT_VINFO_SIMD_LANE_ACCESS_P. > > gcc/testsuite/ > PR tree-optimization/118384 > * gcc.target/aarch64/pr118384_1.c: New test. > * gcc.target/aarch64/pr118384_2.c: Likewise. > --- > gcc/testsuite/gcc.target/aarch64/pr118384_1.c | 31 +++++++++++++++++++ > gcc/testsuite/gcc.target/aarch64/pr118384_2.c | 4 +++ > gcc/tree-vectorizer.cc | 2 ++ > 3 files changed, 37 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr118384_1.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr118384_2.c > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr118384_1.c > b/gcc/testsuite/gcc.target/aarch64/pr118384_1.c > new file mode 100644 > index 00000000000..75f6dada63a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr118384_1.c > @@ -0,0 +1,31 @@ > +/* { dg-do run { target aarch64_sve128_hw } } */ > +/* { dg-options "-O2 -fopenmp-simd -fno-trapping-math -msve-vector-bits=128 > --param aarch64-autovec-preference=sve-only -fstack-protector-strong" } */ > + > +#pragma GCC target "+sve" > + > +[[gnu::noipa]] float f(float *ptr, long n) > +{ > + float res = 0.0f; > +#pragma omp simd reduction(+:res) > + for (long i = 0; i < n; ++i) > + if (ptr[i] >= 1.0f) > + res += ptr[i]; > + return res; > +} > + > +[[gnu::noipa]] float g(float *ptr, long n) > +{ > + return f(ptr, n) + 1; > +} > + > +int > +main () > +{ > +#define N 64 * 1024 > + float data[N]; > + for (long i = 0; i < N; ++i) > + data[i] = 1; > + if (g(data, N) != N + 1) > + __builtin_abort(); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/pr118384_2.c > b/gcc/testsuite/gcc.target/aarch64/pr118384_2.c > new file mode 100644 > index 00000000000..f45a222bb72 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr118384_2.c > @@ -0,0 +1,4 @@ > +/* { dg-do run { target aarch64_sve256_hw } } */ > +/* { dg-options "-O2 -fopenmp-simd -fno-trapping-math -msve-vector-bits=256 > --param aarch64-autovec-preference=sve-only -fstack-protector-strong" } */ > + > +#include "pr118384_1.c" > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc > index 01c64700608..f38c8d20a02 100644 > --- a/gcc/tree-vectorizer.cc > +++ b/gcc/tree-vectorizer.cc > @@ -633,6 +633,8 @@ vec_info::move_dr (stmt_vec_info new_stmt_info, > stmt_vec_info old_stmt_info) > = STMT_VINFO_GATHER_SCATTER_P (old_stmt_info); > STMT_VINFO_STRIDED_P (new_stmt_info) > = STMT_VINFO_STRIDED_P (old_stmt_info); > + STMT_VINFO_SIMD_LANE_ACCESS_P (new_stmt_info) > + = STMT_VINFO_SIMD_LANE_ACCESS_P (old_stmt_info); > } > > /* Permanently remove the statement described by STMT_INFO from the > -- > 2.25.1 >