On Fri, Oct 18, 2024 at 11:52:23AM +0530, Tejas Belagod wrote: > This patch adds a test scaffold for OpenMP compile tests in under the > gcc.target > testsuite. It also adds a target tests directory libgomp.target along with an > SVE execution test > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/omp/gomp.exp: New scaffold.
s/scaffold/test driver/ ? Also, my slight preference would be gomp subdirectory rather than omp, consistency is nice. > > libgomp/ChangeLog: > > * testsuite/libgomp.target/aarch64/aarch64.exp: New scaffold. Likewise. Plus I wonder about the libgomp.target name. In gcc/testsuite/ we have gcc.target, g++.target and gfortran.target subdirectories so it is clear which languages they handle, but libgomp.target could mean anything. So, wouldn't libgomp.target.c or libgomp.c-target be better directory name? The latter to match e.g. libgomp.oacc-{c,c++,fortran}. > * testsuite/libgomp.target/aarch64/shared.c: New test. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/gomp.exp > @@ -0,0 +1,46 @@ > +# Copyright (C) 2006-2024 Free Software Foundation, Inc. s/2024/2025/ before committing anything, otherwise copyright bumping won't handle it next year either. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.target/aarch64/aarch64.exp > @@ -0,0 +1,57 @@ > +# Copyright (C) 2006-2024 Free Software Foundation, Inc. Ditto. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.target/aarch64/shared.c > @@ -0,0 +1,186 @@ > +/* { dg-do run { target aarch64_sve256_hw } } */ > +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 > -fdump-tree-ompexp" } */ Is -std=gnu99 needed (now that gcc defaults to -std=gnu23)? I guess most of -std=gnu99 is from the time when C99 wasn't the default. > + > +#include <arm_sve.h> > +#include <stdint.h> > +#include <stdlib.h> > +#include <stdbool.h> > + > +svint32_t > +__attribute__ ((noinline)) > +explicit_shared (svint32_t a, svint32_t b, svbool_t p) > +{ > + > +#pragma omp parallel shared (a, b, p) num_threads (1) > + { > + /* 'a', 'b' and 'p' are explicitly shared. */ > + a = svadd_s32_z (p, a, b); > + } With the num_threads (1) it isn't a good example, then the parallel is pretty much useless. Would be better to test without that, doesn't have to be tons of threads, but at least 2-4. With num_threads (2) it is then racy though, stores the same a in all threads. Can one have arrays of svint32_t? If not, perhaps svint32_t c; #pragma omp parallel shared (a, b, c, p) num_threads (2) #pragma omp sections { /* 'a', 'b', 'c' and 'p' are explicitly shared. */ a = svadd_s32_z (p, a, b); #pragma omp section c = svadd_s32_z (p, a, b); } #pragma omp parallel shared (a, b, c, p) num_threads (2) #pragma omp sections { a = svadd_s32_z (p, a, b); #pragma omp section c = svadd_s32_z (p, c, b); } compare_vec (a, c); return a; ? > +svint32_t > +__attribute__ ((noinline)) > +implicit_shared_default (svint32_t a, svint32_t b, svbool_t p) > +{ > + > +#pragma omp parallel default (shared) num_threads (1) > + { > + /* 'a', 'b' and 'p' are implicitly shared. */ > + a = svadd_s32_z (p, a, b); Again, bad example, works only with num_threads (1), otherwise it is racy. > +svint32_t > +__attribute__ ((noinline)) > +mix_shared (svint32_t b, svbool_t p) > +{ > + > + svint32_t a; > + int32_t *m = (int32_t *)malloc (8 * sizeof (int32_t)); Formatting, missing space before malloc > + int i; > + > +#pragma omp parallel for > + for (i = 0; i < 8; i++) > + m[i] = i; > + > +#pragma omp parallel > + { > + /* 'm' is predetermined shared here. 'a' is implicitly shared here. */ > + a = svld1_s32 (svptrue_b32 (), m); This is racy. Either different threads need to write to different shared variables, or if arrays of vectors work to different elements of array, or it can be guarded with say #pragma omp masked (so that only specific thread does that), or use just low number of threads and write depending on omp_get_thread_num () to this or that. Just note that you could get fewer threads than you asked for. > +#pragma omp parallel num_threads (1) > + { > + /* 'a', 'b' and 'p' are implicitly shared here. */ > + a = svadd_s32_z (p, a, b); > + } > + > +#pragma omp parallel shared (a, b, p) num_threads (1) > + { > + /* 'a', 'b' and 'p' are explicitly shared here. */ > + a = svadd_s32_z (p, a, b); > + } These aren't racy during num_threads (1), but because of that not really good examples on how shared works. > + int32_t *m = (int32_t *)malloc (8 * sizeof (int32_t)); See above. > + int i; > + > +#pragma omp parallel for > + /* 'm' is predetermined shared here. */ > + for (i = 0; i < 8; i++) > + { > + m[i] = i; > + } No need for the {}s around the body. > + > +#pragma omp parallel > + { > + /* 'a' is predetermined shared here. */ > + static int64_t n; > + svint32_t a; > + #pragma omp parallel > + { > + /* 'n' is predetermined shared here. */ > + if (x) > + { > + a = svld1_s32 (svptrue_b32 (), m); > + n = svaddv_s32 (svptrue_b32 (), a); Again, racy. > + } > + if (!x && n != 28) > + __builtin_abort (); > + svint32_t x = svindex_s32 (0 ,1); Formatting, space after comma rather than before. > + svint32_t y = svindex_s32 (8, 1); > + svint32_t a, b; > + svbool_t p; > + > + /* Implicit shared. */ > + a = foo (x, y, p); > + b = implicit_shared_default (x, y, p); > + compare_vec (a, b); > + > + /* Explicit shared. */ > + a = foo (x ,y, p); > + b = explicit_shared (x, y, p); > + compare_vec (a, b); > + > + /* Implicit shared with no default clause. */ > + a = foo (x ,y, p); Formatting. > + b = implicit_shared_no_default (x, y, p); > + compare_vec (a, b); > + > + /* Mix shared. */ > + a = foo (x ,y, p); Again. > + b = mix_shared (y, p); > + compare_vec (a, b); > + > + /* Predetermined shared. */ > + predetermined_shared_static (true); > + predetermined_shared_static (false); > + > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-times "value-expr: \*.omp_data_i->a" 10 > "ompexp" } } */ Jakub