On Mon, May 27, 2024 at 10:36:21AM +0530, Tejas Belagod wrote:
> This patch tests user-defined reductions on various constructs with objects
> of SVE type.
> 
> gcc/testsuite/ChangeLog
> 
>       * gcc.target/aarch64/sve/omp/udr-sve.c: New test.
> ---
>  .../gcc.target/aarch64/sve/omp/udr-sve.c      | 166 ++++++++++++++++++
>  1 file changed, 166 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
> new file mode 100644
> index 00000000000..049fbee9056
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
> @@ -0,0 +1,166 @@
> +/* { dg-do run } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */
> +
> +#include <arm_sve.h>
> +
> +#pragma omp declare reduction (+:svint32_t: omp_out = svadd_s32_z 
> (svptrue_b32(), omp_in, omp_out))

Don't you need initializer clause, or is zero initialization what works for
this type?
In any case, it would be useful to also test with non-trivial initializer
clause.

> +
> +int parallel_reduction ()

Function name should go on next line (in various places).

> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};

The space before , rather than after it is weird.

> +int for_reduction ()
> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  svint32_t va = svld1_s32 (svptrue_b32 (), b);
> +  int i = 0;
> +  int j;
> +  int64_t res;
> +
> +  #pragma omp parallel for reduction (+:va, i)
> +  for (j = 0; j < 8; j++)
> +    {
> +      va = svld1_s32 (svptrue_b32 (), a);
> +      i++;

Why the i at all?  The loop has 8 iterations, no need
to have a reduction for that, just assume it must be 8 * 8 later.
Note, for the #pragma omp parallel case that was needed (or you could query
omp_get_num_threads ();).

> +  /* The list includes va that is already vectorized, so the only impact here
> +     is on the scalar variable i.  OMP spec says only scalar variables are
> +     allowed in the list.  Should non-scalars be diagnosed?  */

Why it should be diagnosed?  The implementation can always choose not to
vectorize, vectorization factor 1 is valid, and that is really what we use
for various cases we can't deal with (e.g. VLA types) or if the vectorizer
gives up.  The standard doesn't talk about vectorization at all, just about
single instruction multiple data and by data there it means whatever user
wrote, so single instruction handling multiple SVE vectors if user wrote
that.  It is fine if we just handle that as vf=1.

> +  #pragma omp simd reduction (+:va, i)
> +  for (j = 0; j < 8; j++)
> +    {
> +      va = svld1_s32 (svptrue_b32 (), a);
> +      i++;

Again, why the i reduction?  The loop has 8 iterations, so it will be always
8 at the end.
> +    }
> +
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
> +  if (res != i)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> +
> +int taskloop_reduction ()
> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  svint32_t va = svld1_s32 (svptrue_b32 (), b);
> +  int i = 0;
> +  int j;
> +  int64_t res;
> +
> +  #pragma omp taskloop reduction (+:va, i)
> +  for (j = 0; j < 8; j++)
> +    {
> +      svint32_t tva = svld1_s32 (svptrue_b32 (), a);
> +      #pragma omp in_reduction (+: va)

This isn't a valid OpenMP pragma.
-Wunknown-pragmas should be diagnosing that.
What do you want to achieve?
in_reduction is valid on task, target and taskloop constructs,
but requires the var to be either a task_reduction or taskloop's
reduction on an outer construct.

> +      va = svadd_s32_z (svptrue_b32 (), tva, va);

Just note that the different iterations of the taskloop could be
in different tasks (though, testing taskloop without some #pragma omp
parallel surrounding it is again kind of pointless unless you really
want to test behavior in that implicit parallel case).
And, if it is separate tasks, e.g. 3rd iteration might not see the
2nd iteration's va but its own cleared one, they'll be only combined later.


> +      i++;
> +    }
> +
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
> +  if (res != i * 8)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> +
> +int task_reduction ()
> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  svint32_t va = svld1_s32 (svptrue_b32 (), b);
> +  int i = 0;
> +  int j;
> +  int64_t res;
> +
> +  #pragma omp parallel reduction (task,+:va)
> +  {
> +    va = svadd_s32_z (svptrue_b32 (), svld1_s32 (svptrue_b32 (), a), va);
> +    i++;
> +  }

The use of task modifier doesn't make much sense here, if you want to test
it, you'd need to create tasks and use in_reduction (+:va) on it.
> +
> +  res = svaddv_s32 (svptrue_b32 (), va);
> +
> +  if (res != i * 8)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> +
> +int inscan_reduction_incl ()
> +{
> +  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
> +  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
> +  svint32_t va = svld1_s32 (svptrue_b32 (), b);
> +  int j;
> +  int i = 0;
> +  int64_t res = 0;
> +
> +  #pragma omp parallel
> +  #pragma omp for reduction (inscan,+:va, i)
> +  for (j = 0; j < 8; j++)
> +    {
> +      va = svld1_s32 (svptrue_b32 (), a);
> +      i++;

Again, i will be 8.  Don't reduce that.

> +      #pragma omp scan inclusive (va, i)
> +      res += svaddv_s32 (svptrue_b32 (), va);
> +    }
> +
> +  if (res != i * 8)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> +
> +int
> +main()

Space before (.
> +{
> +  parallel_reduction ();
> +  task_reduction ();
> +  inscan_reduction_incl ();
> +  taskloop_reduction ();
> +  simd_reduction ();
> +  for_reduction ();
> +
> +  return 0;
> +}

        Jakub

Reply via email to