This patch fixes a race condition bug in libgomp.oacc-c-c++-common/data-2-lib.c. That is an OpenACC test which exercises the runtime wait API, for use in conjunction with asynchronous OpenACC offloaded regions. I not sure why this problem went undetected for so long. Either the parallel region runs too fast on the GPU so that the copy'ed out data is correct, or the Nvidia's CUDA runtime blocks all device->host data transfers until the GPU is no longer processing the data. I suspect it's the former.
I've applied this patch to trunk and og7 as obvious. Cesar
2017-12-01 Cesar Philippidis <ce...@codesourcery.com> libgomp/ * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Add missing call to acc_wait (1). diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c index 1694f582363..f553d3d839c 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c @@ -64,6 +64,8 @@ main (int argc, char **argv) for (i = 0; i < N; i++) b[i] = a[i]; + acc_wait (1); + acc_memcpy_from_device (a, d_a, nbytes); acc_memcpy_from_device (b, d_b, nbytes);