Arsen Arsenović wrote:
Here's a fix for another flaky test.
In r11-3059-g8183ebcdc1c843, Julian fixed a few issues with
atomic_capture-2.c relying on iteration order guarantees that do not
exist under OpenACC parallelized loops and, notably, do not happen even
by accident on AMDGCN.
Namely, it is unspecified whether for
#pragma acc parallel loop
for (i = 0; i < N; i++)
#pragma acc atomic capture
{ idata[i] = igot; igot = i; }
This will once use the original value 1234 for igot
and otherwise one previous 'i'. Thus, idata[...] has
all i = 0 to N-1, except for one.
* * *
Likewise for the others.
The atomic_capture-3.c testcase was made by copying it from
atomic_capture-2.c and adding additional options in commit
r12-310-g4cf3b10f27b199, but from an older version of
atomic_capture-2.c, which lacked these ordering fixes fixes, so they
resurfaced in this test.
atomic_capture-3.c seems to be identical to atomic_capture-2.c
except for the explicitly added:
/* { dg-additional-options "-fmodulo-sched
-fmodulo-sched-allow-regmoves" } */
for PR rtl-optimization/100225 and PR rtl-optimization/84878.
* * *
This patch ports those fixes from atomic_capture-2.c into
atomic_capture-3.c.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: Copy
changes in r11-3059-g8183ebcdc1c843 from atomic_capture-2.c.
LGTM - thanks for digging!
Tobias
PS: I wonder whether it wouldn't have been more sensible to use
#include "atomic_capture-2.c" and mentioned the PRs as comment.
But that's a 2021 topic ...