On Mon, May 27, 2024 at 10:36:21AM +0530, Tejas Belagod wrote: > This patch tests user-defined reductions on various constructs with objects > of SVE type. > > gcc/testsuite/ChangeLog > > * gcc.target/aarch64/sve/omp/udr-sve.c: New test. > --- > .../gcc.target/aarch64/sve/omp/udr-sve.c | 166 ++++++++++++++++++ > 1 file changed, 166 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c > b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c > new file mode 100644 > index 00000000000..049fbee9056 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c > @@ -0,0 +1,166 @@ > +/* { dg-do run } */ > +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 > -fdump-tree-ompexp" } */ > + > +#include <arm_sve.h> > + > +#pragma omp declare reduction (+:svint32_t: omp_out = svadd_s32_z > (svptrue_b32(), omp_in, omp_out))
Don't you need initializer clause, or is zero initialization what works for this type? In any case, it would be useful to also test with non-trivial initializer clause. > + > +int parallel_reduction () Function name should go on next line (in various places). > +{ > + int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1}; > + int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0}; The space before , rather than after it is weird. > +int for_reduction () > +{ > + int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1}; > + int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0}; > + svint32_t va = svld1_s32 (svptrue_b32 (), b); > + int i = 0; > + int j; > + int64_t res; > + > + #pragma omp parallel for reduction (+:va, i) > + for (j = 0; j < 8; j++) > + { > + va = svld1_s32 (svptrue_b32 (), a); > + i++; Why the i at all? The loop has 8 iterations, no need to have a reduction for that, just assume it must be 8 * 8 later. Note, for the #pragma omp parallel case that was needed (or you could query omp_get_num_threads ();). > + /* The list includes va that is already vectorized, so the only impact here > + is on the scalar variable i. OMP spec says only scalar variables are > + allowed in the list. Should non-scalars be diagnosed? */ Why it should be diagnosed? The implementation can always choose not to vectorize, vectorization factor 1 is valid, and that is really what we use for various cases we can't deal with (e.g. VLA types) or if the vectorizer gives up. The standard doesn't talk about vectorization at all, just about single instruction multiple data and by data there it means whatever user wrote, so single instruction handling multiple SVE vectors if user wrote that. It is fine if we just handle that as vf=1. > + #pragma omp simd reduction (+:va, i) > + for (j = 0; j < 8; j++) > + { > + va = svld1_s32 (svptrue_b32 (), a); > + i++; Again, why the i reduction? The loop has 8 iterations, so it will be always 8 at the end. > + } > + > + res = svaddv_s32 (svptrue_b32 (), va); > + > + if (res != i) > + __builtin_abort (); > + > + return 0; > +} > + > +int taskloop_reduction () > +{ > + int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1}; > + int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0}; > + svint32_t va = svld1_s32 (svptrue_b32 (), b); > + int i = 0; > + int j; > + int64_t res; > + > + #pragma omp taskloop reduction (+:va, i) > + for (j = 0; j < 8; j++) > + { > + svint32_t tva = svld1_s32 (svptrue_b32 (), a); > + #pragma omp in_reduction (+: va) This isn't a valid OpenMP pragma. -Wunknown-pragmas should be diagnosing that. What do you want to achieve? in_reduction is valid on task, target and taskloop constructs, but requires the var to be either a task_reduction or taskloop's reduction on an outer construct. > + va = svadd_s32_z (svptrue_b32 (), tva, va); Just note that the different iterations of the taskloop could be in different tasks (though, testing taskloop without some #pragma omp parallel surrounding it is again kind of pointless unless you really want to test behavior in that implicit parallel case). And, if it is separate tasks, e.g. 3rd iteration might not see the 2nd iteration's va but its own cleared one, they'll be only combined later. > + i++; > + } > + > + res = svaddv_s32 (svptrue_b32 (), va); > + > + if (res != i * 8) > + __builtin_abort (); > + > + return 0; > +} > + > +int task_reduction () > +{ > + int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1}; > + int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0}; > + svint32_t va = svld1_s32 (svptrue_b32 (), b); > + int i = 0; > + int j; > + int64_t res; > + > + #pragma omp parallel reduction (task,+:va) > + { > + va = svadd_s32_z (svptrue_b32 (), svld1_s32 (svptrue_b32 (), a), va); > + i++; > + } The use of task modifier doesn't make much sense here, if you want to test it, you'd need to create tasks and use in_reduction (+:va) on it. > + > + res = svaddv_s32 (svptrue_b32 (), va); > + > + if (res != i * 8) > + __builtin_abort (); > + > + return 0; > +} > + > +int inscan_reduction_incl () > +{ > + int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1}; > + int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0}; > + svint32_t va = svld1_s32 (svptrue_b32 (), b); > + int j; > + int i = 0; > + int64_t res = 0; > + > + #pragma omp parallel > + #pragma omp for reduction (inscan,+:va, i) > + for (j = 0; j < 8; j++) > + { > + va = svld1_s32 (svptrue_b32 (), a); > + i++; Again, i will be 8. Don't reduce that. > + #pragma omp scan inclusive (va, i) > + res += svaddv_s32 (svptrue_b32 (), va); > + } > + > + if (res != i * 8) > + __builtin_abort (); > + > + return 0; > +} > + > +int > +main() Space before (. > +{ > + parallel_reduction (); > + task_reduction (); > + inscan_reduction_incl (); > + taskloop_reduction (); > + simd_reduction (); > + for_reduction (); > + > + return 0; > +} Jakub