gimple_copy missed to unshare_expr its associated trees,
which leads the the issue below.

All other cases seem to be handled as they either
use operators (next lines) - or have an associated
block and are processed above the added code in the
'gimple_has_substatements' block.

Any comments, remarks?

* * *

Background why/how the testcases fail (attached + those of the PRs):

If there is SIMT support, 'omp simd' is copied into a SIMD version
and into a SIMT version. Currently, only nvptx supports SIMT, i.e.
this only happens if also compiling for nvptx offloading.

For
  omp atomic_store relaxed
     D.234487 = *&cnt'
the value expr for 'cnt' is expanded as:
  D.234913 = .omp_data_i->cnt;

However, this only happens in one branch (SIMT) as
both trees are shared - such that the variable is
unintialized in the SIMD.

Thus, this only showed up with compiling (also) for Nvidia
GPUs (with -O1 or higher) - but running on the host or AMD
GPUs.

* * *

Tobias
Fix gimple_copy for OpenMP atomic load/store [PR122281, PR105001]

	PR libgomp/122281
	PR middle-end/105001

gcc/ChangeLog:

	* gimple.cc (gimple_copy): Add missing unshare_expr for
	GIMPLE_OMP_ATOMIC_LOAD and GIMPLE_OMP_ATOMIC_STORE.

libgomp/ChangeLog:

	* testsuite/libgomp.c/pr122281.c: New test.

 gcc/gimple.cc                          | 22 +++++++++++++++++
 libgomp/testsuite/libgomp.c/pr122281.c | 43 ++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/gcc/gimple.cc b/gcc/gimple.cc
index 102e21fe5e5..b968a45aaa0 100644
--- a/gcc/gimple.cc
+++ b/gcc/gimple.cc
@@ -2283,6 +2283,28 @@ gimple_copy (gimple *stmt)
 	}
     }
 
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_ATOMIC_LOAD:
+      {
+	gomp_atomic_load *g = as_a <gomp_atomic_load *> (copy);
+	gimple_omp_atomic_load_set_lhs (g,
+	  unshare_expr (gimple_omp_atomic_load_lhs (g)));
+	gimple_omp_atomic_load_set_rhs (g,
+	  unshare_expr (gimple_omp_atomic_load_rhs (g)));
+	break;
+      }
+    case GIMPLE_OMP_ATOMIC_STORE:
+      {
+	gomp_atomic_store *g = as_a <gomp_atomic_store *> (copy);
+	gimple_omp_atomic_store_set_val (g,
+	  unshare_expr (gimple_omp_atomic_store_val (g)));
+	break;
+      }
+    default:
+      break;
+    }
+
   /* Make copy of operands.  */
   for (i = 0; i < num_ops; i++)
     gimple_set_op (copy, i, unshare_expr (gimple_op (stmt, i)));
diff --git a/libgomp/testsuite/libgomp.c/pr122281.c b/libgomp/testsuite/libgomp.c/pr122281.c
new file mode 100644
index 00000000000..a02a728c5f0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr122281.c
@@ -0,0 +1,43 @@
+/* { dg-do run }  */
+/* { dg-additional-options "-O3" }  */
+
+/* PR libgomp/122281  */
+/* PR middle-end/105001  */
+
+/* If SIMT is supported, the inner 'omp simd' is duplicated into
+   one SIMT and one SIMD variant. SIMT is currently only supported
+   with nvidia GPUs.  (This only happens with -O1 or higher.)
+
+   The duplication failed for the SIMD case as a tree was shared and
+   the initialization only happened in the SIMT branch, i.e. when
+   compiling for a SIMT-device, all non-SIMD (offload or host devices)
+   accesses failed (segfault) for the atomic update.  */
+
+#include <omp.h>
+
+int __attribute__((noinline, noclone))
+f(int *A, int n, int dev) {
+ int cnt = 0;
+ #pragma omp target map(cnt) device(dev)
+ {
+   #pragma omp parallel for simd
+   for (int i = 0; i < n; i++)
+   if (A[i] != 0)
+     {
+       #pragma omp atomic
+       cnt++;
+     }
+ }
+ return cnt;
+}
+
+int main() {
+  int n = 10;
+  int A[10] = {11,22,33,44,55,66,77,88,99,110};
+
+  /* Run over all devices, including the host; the host should be SIMD,
+     some non-host devices might be SIMT.  */
+  for (int dev = omp_initial_device; dev <= omp_get_num_devices(); dev++)
+    if (f (A, n, dev) != 10)
+      __builtin_abort();
+}

Reply via email to