On Mon, Jan 20, 2025 at 8:57 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> OMP reductions are lowered into the form:
>
>     idx = .OMP_SIMD_LANE (simuid, 0);
>     ...
>     oldval = D.anon[idx];
>     newval = oldval op ...;
>     D.anon[idx] = newval;
>
> So if the scalar loop has a {0, +, 1} iv i, idx = i % vf.
> Despite this wraparound, the vectoriser pretends that the D.anon
> accesses are linear.  It records the .OMP_SIMD_LANE's second argument
> (val) in the data_reference aux field (-1 - val) and then copies this
> to the stmt_vec_info simd_lane_access_p field (val + 1).
>
> vectorizable_load and vectorizable_store use simd_lane_access_p
> to detect accesses of this form and suppress the vector pointer
> increments that would be used for genuine linear accesses.
>
> The difference in this PR is that the reduction is conditional,
> and so the store back to D.anon is recognised as a conditional
> store pattern.  simd_lane_access_p was not being copied across
> from the original stmt_vec_info to the pattern stmt_vec_info,
> meaning that it was vectorised as a normal linear store.

Thanks for fixing.

> Tested on aarch64-linux-gnu & pushed as obvious (I hope).
> I imagine this would eventually be represented differently in
> the new all-SLP world, but I assumed that would be stage 1 material.

Yeah - I stopped short of re-implementing all of this in the hope it
will get simpler once I do not have to care to not break the non-SLP path ...
(same is true for gather/scatter analysis)

Richard.

> Richard
>
>
> gcc/
>         PR tree-optimization/118384
>         * tree-vectorizer.cc (vec_info::move_dr): Copy
>         STMT_VINFO_SIMD_LANE_ACCESS_P.
>
> gcc/testsuite/
>         PR tree-optimization/118384
>         * gcc.target/aarch64/pr118384_1.c: New test.
>         * gcc.target/aarch64/pr118384_2.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/aarch64/pr118384_1.c | 31 +++++++++++++++++++
>  gcc/testsuite/gcc.target/aarch64/pr118384_2.c |  4 +++
>  gcc/tree-vectorizer.cc                        |  2 ++
>  3 files changed, 37 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr118384_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr118384_2.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr118384_1.c 
> b/gcc/testsuite/gcc.target/aarch64/pr118384_1.c
> new file mode 100644
> index 00000000000..75f6dada63a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr118384_1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run { target aarch64_sve128_hw } } */
> +/* { dg-options "-O2 -fopenmp-simd -fno-trapping-math -msve-vector-bits=128 
> --param aarch64-autovec-preference=sve-only -fstack-protector-strong" } */
> +
> +#pragma GCC target "+sve"
> +
> +[[gnu::noipa]] float f(float *ptr, long n)
> +{
> +  float res = 0.0f;
> +#pragma omp simd reduction(+:res)
> +  for (long i = 0; i < n; ++i)
> +    if (ptr[i] >= 1.0f)
> +      res += ptr[i];
> +  return res;
> +}
> +
> +[[gnu::noipa]] float g(float *ptr, long n)
> +{
> +  return f(ptr, n) + 1;
> +}
> +
> +int
> +main ()
> +{
> +#define N 64 * 1024
> +  float data[N];
> +  for (long i = 0; i < N; ++i)
> +    data[i] = 1;
> +  if (g(data, N) != N + 1)
> +    __builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr118384_2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr118384_2.c
> new file mode 100644
> index 00000000000..f45a222bb72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr118384_2.c
> @@ -0,0 +1,4 @@
> +/* { dg-do run { target aarch64_sve256_hw } } */
> +/* { dg-options "-O2 -fopenmp-simd -fno-trapping-math -msve-vector-bits=256 
> --param aarch64-autovec-preference=sve-only -fstack-protector-strong" } */
> +
> +#include "pr118384_1.c"
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index 01c64700608..f38c8d20a02 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -633,6 +633,8 @@ vec_info::move_dr (stmt_vec_info new_stmt_info, 
> stmt_vec_info old_stmt_info)
>      = STMT_VINFO_GATHER_SCATTER_P (old_stmt_info);
>    STMT_VINFO_STRIDED_P (new_stmt_info)
>      = STMT_VINFO_STRIDED_P (old_stmt_info);
> +  STMT_VINFO_SIMD_LANE_ACCESS_P (new_stmt_info)
> +    = STMT_VINFO_SIMD_LANE_ACCESS_P (old_stmt_info);
>  }
>
>  /* Permanently remove the statement described by STMT_INFO from the
> --
> 2.25.1
>

Reply via email to