> -----Original Message-----
> From: Tobias Burnus <[email protected]>
> Sent: 19 January 2026 16:46
> To: Matthew Malcomson <[email protected]>; gcc-
> [email protected]; Jakub Jelinek <[email protected]>
> Cc: Julian Brown <[email protected]>; Thomas Schwinge
> <[email protected]>; Andrew Stubbs <[email protected]>; Tom de
> Vries <[email protected]>; Sebastian Huber <sebastian.huber@embedded-
> brains.de>
> Subject: Re: [PATCH 2/5] libgomp: Ensure memory sync after performing
> tasks
> 
> External email: Use caution opening links or attachments
> 
> 
> On November 26, 2025, [email protected] wrote:
> > As described in PR 122356 there is a theoretical bug around not
> > "publishing" user data written in a task when that task has been
> > executed by a thread after entry to a barrier.
> 
> LGTM – but can you also execute 'test()' in the testcase¹ on the
> device? That way, also nvptx (not modified) and gcn get tested.
> 
> Thanks for the patch and sorry for the delay.
> 
> Tobias
> 
> ¹I mean as with previous patch:
>    test (); // execute on the host (in the OpenMP sense)
>    #pragma omp target
>      test ();  // execute on the default device
> 
> which runs it either twice on the host (if the default device is the
> host) or once on the host and once on the GPU (nvptx or gcn), if
> available.
> 
> (quote w/o comments:)
Hi Tobias,
The attached patch similarly adds target pragmas to the test as previous patch.
Does it look OK ?

Thanks,
Prathamesh
> 
> > Key points of the C memory model that are relevant:
> > 1) Memory writes can be seen in a different order in different
> threads.
> > 2) When one thread (A) reads a value with acquire memory ordering
> that
> >     another thread (B) has written with release memory ordering,
> then all
> >     data written in thread (B) before the write that set this value
> will
> >     be visible to thread (A) after that read.
> > 3) This point requires that the read and write operate on the same
> >     value.  The guarantee is one-way:  It specifies that thread (A)
> will
> >     see the writes that thread (B) has performed before the
> specified
> >     write.  It does not specify that thread (B) will see writes that
> >     thread (A) has performed before reading this value.
> >
> > Outline of the issue:
> > 1) While there is a memory sync at entry to the barrier, user code
> can
> >     be ran after threads have all entered the barrier.
> > 2) There are various points where a memory sync can occur after
> entry to
> >     the barrier:
> >     - One thread getting the `task_lock` mutex that another thread
> has
> >       released.
> >     - Last thread incrementing `bar->generation` with
> `MEMMODEL_RELEASE`
> >       and some other thread reading it with `MEMMODEL_ACQUIRE`.
> >     However there are code paths that can avoid these points.
> > 3) On the code-paths that can avoid these points we could have no
> memory
> >     synchronisation between a write to user data that happened in a
> task
> >     executed after entry to the barrier, and some other thread
> running
> >     the implicit task after the barrier.  Hence that "other thread"
> may
> >     read a stale value that should have been overwritten in the
> explicit
> >     task.
> >
> > There are two code-paths that I believe I've identified:
> > 1) The last thread sees `task_count == 0` and increments the
> generation
> >     with `MEMMODEL_RELEASE` before continuing on to the next
> implicit
> >     task.
> >     If some other thread had executed a task that wrote user data I
> >     don't see any way in which an acquire-release ordering *from*
> the
> >     thread writing user data *to* the last thread would have been
> formed.
> > 2) After all threads have entered the barrier.  Some thread (A) is
> >     waiting in `do_wait`.  Some other thread (B) completes a task
> writing
> >     user data.  Thread (B) increments the generation using
> >     `gomp_team_barrier_done` (non atomically -- hence not allowing
> the
> >     formation of any acquire-release ordering with this write).
> Thread
> >     (A) reads that data with `MEMMODEL_ACQUIRE`, but since the write
> was
> >     not atomic that does not form an ordering.
> >
> > This patch makes two changes:
> > 1) The write of `task_count == 0` in `gomp_barrier_handle_tasks` is
> done
> >     atomically while the read of `task_count` in
> >     `gomp_team_barrier_wait_end` is also made atomic.  This
> addresses the
> >     first case by forming an acquire-release ordering *from* the
> thread
> >     executing tasks *to* the thread that will increment the
> generation
> >     and continue.
> > 2) The write of `bar->generation` via `gomp_team_barrier_done`
> called
> >     from `gomp_barrier_handle_tasks` is done atomically.  This means
> that
> >     it will form an acquire-release synchronisation with the
> existing
> >     atomic read of `bar->generation` in the main loop of
> >     `gomp_team_barrier_wait_end`.
> >
> > Concerns I have with this patch:
> > 1) I've made the change as "seems correct" from reading the code and
> >     have checked that it builds, but can't be that certain in my
> >     modifications.
> >     - Should I remove changes outside of any targets that I can't
> check
> >       myself instead?
> >
> > 2) I don't understand the nvptx backend.  Looks like it might be
> correct
> >     since it has some kind of atomic operation on `task_count` to
> check
> >     for zero.  I've left it alone rather than try to understand.
> >
> > 3) I believe that we technically still have UB in this area (that
> was
> >     around before).  TSAN pointed it out in some experiments I was
> >     running.  If we are writing the value non-atomically for the
> >     task_count != 1 case and reading the value atomically in the
> primary
> >     thread then I believe this is UB simply because there is a read
> and a
> >     write happening to the same location where at least one is not
> >     atomic.
> >     - Have not attempted to fix this because this kind of UB is
> already
> >       quite widespread in libgomp (according to TSAN) -- e.g.
> >       `task_count` is read in GOMP_taskloop and GOMP_task outside of
> the
> >       `task_lock` mutex and that's technically UB because there's no
> >       synchronisation between those reads and the writes under the
> mutex.
> >     - I figure that fixing this kind of UB is a much larger job to
> be
> >       taken on at a later time.
> >
> > Testing done:
> > - Bootstrap & regtest on aarch64 and x86_64.
> >    - With & without _LIBGOMP_CHECKING_.
> >    - Testsuite with & without OMP_WAIT_POLICY=passive
> > - Cross compilation & regtest on arm.
> > - TSAN done on this as part of all my upstream patches.
> >
> > libgomp/ChangeLog:
> >
> >       PR libgomp/122356
> >       * config/gcn/bar.c (gomp_team_barrier_wait_end): Atomically
> read
> >       team->task_count.
> >       (gomp_team_barrier_wait_cancel_end): Likewise.
> >       * config/gcn/bar.h (gomp_team_barrier_done): Atomically write
> >       bar->generation.
> >       * config/linux/bar.c (gomp_team_barrier_wait_end): Atomically
> >       read team->task_count.
> >       (gomp_team_barrier_wait_cancel_end): Likewise.
> >       * config/linux/bar.h (gomp_team_barrier_done): Atomically
> write
> >       bar->generation.
> >       * config/posix/bar.c (gomp_team_barrier_wait_end): Atomically
> >       read team->task_count.
> >       (gomp_team_barrier_wait_cancel_end): Likewise.
> >       * config/posix/bar.h (gomp_team_barrier_done): Atomically
> write
> >       bar->generation.
> >       * config/rtems/bar.h (gomp_team_barrier_done): Atomically
> write
> >       bar->generation.
> >       * task.c (gomp_barrier_handle_tasks): Atomically write
> >       team->task_count when decrementing to zero.
> >       * testsuite/libgomp.c/pr122356.c: New test.
> >
> > Signed-off-by: Matthew Malcomson <[email protected]>
> > ---
> >   libgomp/config/gcn/bar.c               |  8 +++++--
> >   libgomp/config/gcn/bar.h               |  5 +++-
> >   libgomp/config/linux/bar.c             |  8 +++++--
> >   libgomp/config/linux/bar.h             |  5 +++-
> >   libgomp/config/posix/bar.c             |  8 +++++--
> >   libgomp/config/posix/bar.h             |  5 +++-
> >   libgomp/config/rtems/bar.h             |  5 +++-
> >   libgomp/task.c                         |  8 ++++++-
> >   libgomp/testsuite/libgomp.c/pr122356.c | 33
> ++++++++++++++++++++++++++
> >   9 files changed, 74 insertions(+), 11 deletions(-)
> >   create mode 100644 libgomp/testsuite/libgomp.c/pr122356.c
> >
> > diff --git a/libgomp/config/gcn/bar.c b/libgomp/config/gcn/bar.c
> index
> > 05daa8fcbbc..d7e2d755685 100644
> > --- a/libgomp/config/gcn/bar.c
> > +++ b/libgomp/config/gcn/bar.c
> > @@ -89,7 +89,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar,
> > gomp_barrier_state_t state)
> >
> >         bar->awaited = bar->total;
> >         team->work_share_cancelled = 0;
> > -      if (__builtin_expect (team->task_count, 0))
> > +      unsigned task_count
> > +     = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
> > +      if (__builtin_expect (task_count, 0))
> >       {
> >         gomp_barrier_handle_tasks (state);
> >         state &= ~BAR_WAS_LAST;
> > @@ -164,7 +166,9 @@ gomp_team_barrier_wait_cancel_end
> (gomp_barrier_t
> > *bar,
> >
> >         bar->awaited = bar->total;
> >         team->work_share_cancelled = 0;
> > -      if (__builtin_expect (team->task_count, 0))
> > +      unsigned task_count
> > +     = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
> > +      if (__builtin_expect (task_count, 0))
> >       {
> >         gomp_barrier_handle_tasks (state);
> >         state &= ~BAR_WAS_LAST;
> > diff --git a/libgomp/config/gcn/bar.h b/libgomp/config/gcn/bar.h
> index
> > 8fdd6465822..1f28f579092 100644
> > --- a/libgomp/config/gcn/bar.h
> > +++ b/libgomp/config/gcn/bar.h
> > @@ -162,7 +162,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t
> *bar)
> >   static inline void
> >   gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t
> state)
> >   {
> > -  bar->generation = (state & -BAR_INCR) + BAR_INCR;
> > +  /* Need the atomic store for acquire-release synchronisation with
> the
> > +     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356
> > + */  __atomic_store_n (&bar->generation, (state & -BAR_INCR) +
> BAR_INCR,
> > +                 MEMMODEL_RELEASE);
> >   }
> >
> >   static inline bool
> > diff --git a/libgomp/config/linux/bar.c b/libgomp/config/linux/bar.c
> > index 2c9d1ce894d..9eaec0e5f23 100644
> > --- a/libgomp/config/linux/bar.c
> > +++ b/libgomp/config/linux/bar.c
> > @@ -90,7 +90,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar,
> > gomp_barrier_state_t state)
> >
> >         bar->awaited = bar->total;
> >         team->work_share_cancelled = 0;
> > -      if (__builtin_expect (team->task_count, 0))
> > +      unsigned task_count
> > +     = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
> > +      if (__builtin_expect (task_count, 0))
> >       {
> >         gomp_barrier_handle_tasks (state);
> >         state &= ~BAR_WAS_LAST;
> > @@ -154,7 +156,9 @@ gomp_team_barrier_wait_cancel_end
> (gomp_barrier_t
> > *bar,
> >
> >         bar->awaited = bar->total;
> >         team->work_share_cancelled = 0;
> > -      if (__builtin_expect (team->task_count, 0))
> > +      unsigned task_count
> > +     = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
> > +      if (__builtin_expect (task_count, 0))
> >       {
> >         gomp_barrier_handle_tasks (state);
> >         state &= ~BAR_WAS_LAST;
> > diff --git a/libgomp/config/linux/bar.h b/libgomp/config/linux/bar.h
> > index 9fb514526c4..faa03746d8f 100644
> > --- a/libgomp/config/linux/bar.h
> > +++ b/libgomp/config/linux/bar.h
> > @@ -162,7 +162,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t
> *bar)
> >   static inline void
> >   gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t
> state)
> >   {
> > -  bar->generation = (state & -BAR_INCR) + BAR_INCR;
> > +  /* Need the atomic store for acquire-release synchronisation with
> the
> > +     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356
> > + */  __atomic_store_n (&bar->generation, (state & -BAR_INCR) +
> BAR_INCR,
> > +                 MEMMODEL_RELEASE);
> >   }
> >
> >   static inline bool
> > diff --git a/libgomp/config/posix/bar.c b/libgomp/config/posix/bar.c
> > index d728c9c0afe..a86b2f38c2d 100644
> > --- a/libgomp/config/posix/bar.c
> > +++ b/libgomp/config/posix/bar.c
> > @@ -123,7 +123,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar,
> gomp_barrier_state_t state)
> >         struct gomp_team *team = thr->ts.team;
> >
> >         team->work_share_cancelled = 0;
> > -      if (team->task_count)
> > +      unsigned task_count
> > +     = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
> > +      if (task_count)
> >       {
> >         gomp_barrier_handle_tasks (state);
> >         if (n > 0)
> > @@ -185,7 +187,9 @@ gomp_team_barrier_wait_cancel_end
> (gomp_barrier_t *bar,
> >         struct gomp_team *team = thr->ts.team;
> >
> >         team->work_share_cancelled = 0;
> > -      if (team->task_count)
> > +      unsigned task_count
> > +     = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
> > +      if (task_count)
> >       {
> >         gomp_barrier_handle_tasks (state);
> >         if (n > 0)
> > diff --git a/libgomp/config/posix/bar.h b/libgomp/config/posix/bar.h
> > index a48e99488bb..35b94e43ce2 100644
> > --- a/libgomp/config/posix/bar.h
> > +++ b/libgomp/config/posix/bar.h
> > @@ -152,7 +152,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t
> *bar)
> >   static inline void
> >   gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t
> state)
> >   {
> > -  bar->generation = (state & -BAR_INCR) + BAR_INCR;
> > +  /* Need the atomic store for acquire-release synchronisation with
> the
> > +     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356
> > + */  __atomic_store_n (&bar->generation, (state & -BAR_INCR) +
> BAR_INCR,
> > +                 MEMMODEL_RELEASE);
> >   }
> >
> >   static inline bool
> > diff --git a/libgomp/config/rtems/bar.h b/libgomp/config/rtems/bar.h
> > index ea39679e36d..5c8c074c08f 100644
> > --- a/libgomp/config/rtems/bar.h
> > +++ b/libgomp/config/rtems/bar.h
> > @@ -164,7 +164,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t
> *bar)
> >   static inline void
> >   gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t
> state)
> >   {
> > -  bar->generation = (state & -BAR_INCR) + BAR_INCR;
> > +  /* Need the atomic store for acquire-release synchronisation with
> the
> > +     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356
> > + */  __atomic_store_n (&bar->generation, (state & -BAR_INCR) +
> BAR_INCR,
> > +                 MEMMODEL_RELEASE);
> >   }
> >
> >   static inline bool
> > diff --git a/libgomp/task.c b/libgomp/task.c index
> > b7f4b8220c2..5965e781f7e 100644
> > --- a/libgomp/task.c
> > +++ b/libgomp/task.c
> > @@ -1702,7 +1702,13 @@ gomp_barrier_handle_tasks
> (gomp_barrier_state_t state)
> >             if (do_wake > new_tasks)
> >               do_wake = new_tasks;
> >           }
> > -       --team->task_count;
> > +       /* Need to use RELEASE to sync with barrier read outside of
> the
> > +          tasking code (See PR122356).  Only care when decrementing
> to zero
> > +          because that's what the barrier cares about.  */
> > +       if (team->task_count == 1)
> > +         __atomic_store_n (&team->task_count, 0, MEMMODEL_RELEASE);
> > +       else
> > +         team->task_count--;
> >       }
> >       }
> >   }
> > diff --git a/libgomp/testsuite/libgomp.c/pr122356.c
> > b/libgomp/testsuite/libgomp.c/pr122356.c
> > new file mode 100644
> > index 00000000000..5dbf418619c
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/pr122356.c
> > @@ -0,0 +1,33 @@
> > +#include <omp.h>
> > +
> > +void abort ();
> > +
> > +#define NUM_THREADS 8
> > +unsigned full_data[NUM_THREADS] = {0}; void test () { #pragma omp
> > +parallel num_threads(8)
> > +  {
> > +#pragma omp for
> > +    for (int i = 0; i < 10; i++)
> > +#pragma omp task
> > +      {
> > +     full_data[omp_get_thread_num ()] += 1;
> > +      }
> > +#pragma omp barrier
> > +
> > +    unsigned total = 0;
> > +    for (int i = 0; i < NUM_THREADS; i++)
> > +      total += full_data[i];
> > +
> > +    if (total != 10)
> > +      abort ();
> > +  }
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  test ();
> > +}
libgomp: Ensure memory sync after performing tasks.

As described in PR 122356 there is a theoretical bug around not
"publishing" user data written in a task when that task has been
executed by a thread after entry to a barrier.

Key points of the C memory model that are relevant:
1) Memory writes can be seen in a different order in different threads.
2) When one thread (A) reads a value with acquire memory ordering that
   another thread (B) has written with release memory ordering, then all
   data written in thread (B) before the write that set this value will
   be visible to thread (A) after that read.
3) This point requires that the read and write operate on the same
   value.  The guarantee is one-way:  It specifies that thread (A) will
   see the writes that thread (B) has performed before the specified
   write.  It does not specify that thread (B) will see writes that
   thread (A) has performed before reading this value.

Outline of the issue:
1) While there is a memory sync at entry to the barrier, user code can
   be ran after threads have all entered the barrier.
2) There are various points where a memory sync can occur after entry to
   the barrier:
   - One thread getting the `task_lock` mutex that another thread has
     released.
   - Last thread incrementing `bar->generation` with `MEMMODEL_RELEASE`
     and some other thread reading it with `MEMMODEL_ACQUIRE`.
   However there are code paths that can avoid these points.
3) On the code-paths that can avoid these points we could have no memory
   synchronisation between a write to user data that happened in a task
   executed after entry to the barrier, and some other thread running
   the implicit task after the barrier.  Hence that "other thread" may
   read a stale value that should have been overwritten in the explicit
   task.

There are two code-paths that I believe I've identified:
1) The last thread sees `task_count == 0` and increments the generation
   with `MEMMODEL_RELEASE` before continuing on to the next implicit
   task.
   If some other thread had executed a task that wrote user data I
   don't see any way in which an acquire-release ordering *from* the
   thread writing user data *to* the last thread would have been formed.
2) After all threads have entered the barrier.  Some thread (A) is
   waiting in `do_wait`.  Some other thread (B) completes a task writing
   user data.  Thread (B) increments the generation using
   `gomp_team_barrier_done` (non atomically -- hence not allowing the
   formation of any acquire-release ordering with this write).  Thread
   (A) reads that data with `MEMMODEL_ACQUIRE`, but since the write was
   not atomic that does not form an ordering.

This patch makes two changes:
1) The write of `task_count == 0` in `gomp_barrier_handle_tasks` is done
   atomically while the read of `task_count` in
   `gomp_team_barrier_wait_end` is also made atomic.  This addresses the
   first case by forming an acquire-release ordering *from* the thread
   executing tasks *to* the thread that will increment the generation
   and continue.
2) The write of `bar->generation` via `gomp_team_barrier_done` called
   from `gomp_barrier_handle_tasks` is done atomically.  This means that
   it will form an acquire-release synchronisation with the existing
   atomic read of `bar->generation` in the main loop of
   `gomp_team_barrier_wait_end`.

Testing done:
- Bootstrap & regtest on aarch64 and x86_64.
  - With & without _LIBGOMP_CHECKING_.
  - Testsuite with & without OMP_WAIT_POLICY=passive
- Cross compilation & regtest on arm.
- TSAN done on this as part of all my upstream patches.

libgomp/ChangeLog:
        PR libgomp/122356
        * config/gcn/bar.c (gomp_team_barrier_wait_end): Atomically read
        team->task_count.
        (gomp_team_barrier_wait_cancel_end): Likewise.
        * config/gcn/bar.h (gomp_team_barrier_done): Atomically write
        bar->generation.
        * config/linux/bar.c (gomp_team_barrier_wait_end): Atomically
        read team->task_count.
        (gomp_team_barrier_wait_cancel_end): Likewise.
        * config/linux/bar.h (gomp_team_barrier_done): Atomically write
        bar->generation.
        * config/posix/bar.c (gomp_team_barrier_wait_end): Atomically
        read team->task_count.
        (gomp_team_barrier_wait_cancel_end): Likewise.
        * config/posix/bar.h (gomp_team_barrier_done): Atomically write
        bar->generation.
        * config/rtems/bar.h (gomp_team_barrier_done): Atomically write
        bar->generation.
        * task.c (gomp_barrier_handle_tasks): Atomically write
        team->task_count when decrementing to zero.
        * testsuite/libgomp.c/pr122356.c: New test.

Signed-off-by: Matthew Malcomson <[email protected]>

diff --git a/libgomp/config/gcn/bar.c b/libgomp/config/gcn/bar.c
index 10c3f5d1362..3045587f0f3 100644
--- a/libgomp/config/gcn/bar.c
+++ b/libgomp/config/gcn/bar.c
@@ -89,7 +89,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, 
gomp_barrier_state_t state)
 
       bar->awaited = bar->total;
       team->work_share_cancelled = 0;
-      if (__builtin_expect (team->task_count, 0))
+      unsigned task_count
+       = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
+      if (__builtin_expect (task_count, 0))
        {
          gomp_barrier_handle_tasks (state);
          state &= ~BAR_WAS_LAST;
@@ -164,7 +166,9 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
 
       bar->awaited = bar->total;
       team->work_share_cancelled = 0;
-      if (__builtin_expect (team->task_count, 0))
+      unsigned task_count
+       = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
+      if (__builtin_expect (task_count, 0))
        {
          gomp_barrier_handle_tasks (state);
          state &= ~BAR_WAS_LAST;
diff --git a/libgomp/config/gcn/bar.h b/libgomp/config/gcn/bar.h
index 0507efb7d2d..6e838ff54a8 100644
--- a/libgomp/config/gcn/bar.h
+++ b/libgomp/config/gcn/bar.h
@@ -162,7 +162,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t *bar)
 static inline void
 gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t state)
 {
-  bar->generation = (state & -BAR_INCR) + BAR_INCR;
+  /* Need the atomic store for acquire-release synchronisation with the
+     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356  */
+  __atomic_store_n (&bar->generation, (state & -BAR_INCR) + BAR_INCR,
+                   MEMMODEL_RELEASE);
 }
 
 static inline bool
diff --git a/libgomp/config/linux/bar.c b/libgomp/config/linux/bar.c
index 2a1b052b11e..bbdfc896391 100644
--- a/libgomp/config/linux/bar.c
+++ b/libgomp/config/linux/bar.c
@@ -90,7 +90,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, 
gomp_barrier_state_t state)
 
       bar->awaited = bar->total;
       team->work_share_cancelled = 0;
-      if (__builtin_expect (team->task_count, 0))
+      unsigned task_count
+       = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
+      if (__builtin_expect (task_count, 0))
        {
          gomp_barrier_handle_tasks (state);
          state &= ~BAR_WAS_LAST;
@@ -154,7 +156,9 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
 
       bar->awaited = bar->total;
       team->work_share_cancelled = 0;
-      if (__builtin_expect (team->task_count, 0))
+      unsigned task_count
+       = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
+      if (__builtin_expect (task_count, 0))
        {
          gomp_barrier_handle_tasks (state);
          state &= ~BAR_WAS_LAST;
diff --git a/libgomp/config/linux/bar.h b/libgomp/config/linux/bar.h
index b1fff01105a..4dc0d3cca99 100644
--- a/libgomp/config/linux/bar.h
+++ b/libgomp/config/linux/bar.h
@@ -162,7 +162,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t *bar)
 static inline void
 gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t state)
 {
-  bar->generation = (state & -BAR_INCR) + BAR_INCR;
+  /* Need the atomic store for acquire-release synchronisation with the
+     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356  */
+  __atomic_store_n (&bar->generation, (state & -BAR_INCR) + BAR_INCR,
+                   MEMMODEL_RELEASE);
 }
 
 static inline bool
diff --git a/libgomp/config/posix/bar.c b/libgomp/config/posix/bar.c
index ce69905ba67..c46659bd264 100644
--- a/libgomp/config/posix/bar.c
+++ b/libgomp/config/posix/bar.c
@@ -123,7 +123,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, 
gomp_barrier_state_t state)
       struct gomp_team *team = thr->ts.team;
 
       team->work_share_cancelled = 0;
-      if (team->task_count)
+      unsigned task_count
+       = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
+      if (task_count)
        {
          gomp_barrier_handle_tasks (state);
          if (n > 0)
@@ -185,7 +187,9 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
       struct gomp_team *team = thr->ts.team;
 
       team->work_share_cancelled = 0;
-      if (team->task_count)
+      unsigned task_count
+       = __atomic_load_n (&team->task_count, MEMMODEL_ACQUIRE);
+      if (task_count)
        {
          gomp_barrier_handle_tasks (state);
          if (n > 0)
diff --git a/libgomp/config/posix/bar.h b/libgomp/config/posix/bar.h
index 5a175c228c2..026daca793d 100644
--- a/libgomp/config/posix/bar.h
+++ b/libgomp/config/posix/bar.h
@@ -152,7 +152,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t *bar)
 static inline void
 gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t state)
 {
-  bar->generation = (state & -BAR_INCR) + BAR_INCR;
+  /* Need the atomic store for acquire-release synchronisation with the
+     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356  */
+  __atomic_store_n (&bar->generation, (state & -BAR_INCR) + BAR_INCR,
+                   MEMMODEL_RELEASE);
 }
 
 static inline bool
diff --git a/libgomp/config/rtems/bar.h b/libgomp/config/rtems/bar.h
index 61fa91f300f..80fb1cd3be8 100644
--- a/libgomp/config/rtems/bar.h
+++ b/libgomp/config/rtems/bar.h
@@ -164,7 +164,10 @@ gomp_team_barrier_cancelled (gomp_barrier_t *bar)
 static inline void
 gomp_team_barrier_done (gomp_barrier_t *bar, gomp_barrier_state_t state)
 {
-  bar->generation = (state & -BAR_INCR) + BAR_INCR;
+  /* Need the atomic store for acquire-release synchronisation with the
+     load in `gomp_team_barrier_wait_{cancel_,}end`.  See PR112356  */
+  __atomic_store_n (&bar->generation, (state & -BAR_INCR) + BAR_INCR,
+                   MEMMODEL_RELEASE);
 }
 
 static inline bool
diff --git a/libgomp/task.c b/libgomp/task.c
index 554636aadd5..cbba28516e3 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -1702,7 +1702,13 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
              if (do_wake > new_tasks)
                do_wake = new_tasks;
            }
-         --team->task_count;
+         /* Need to use RELEASE to sync with barrier read outside of the
+            tasking code (See PR122356).  Only care when decrementing to zero
+            because that's what the barrier cares about.  */
+         if (team->task_count == 1)
+           __atomic_store_n (&team->task_count, 0, MEMMODEL_RELEASE);
+         else
+           team->task_count--;
        }
     }
 }
diff --git a/libgomp/testsuite/libgomp.c/pr122356.c 
b/libgomp/testsuite/libgomp.c/pr122356.c
new file mode 100644
index 00000000000..76879511ff2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr122356.c
@@ -0,0 +1,40 @@
+#include <omp.h>
+
+void abort ();
+
+#define NUM_THREADS 8
+unsigned full_data[NUM_THREADS] = {0};
+#pragma omp declare target enter(full_data)
+
+void
+test ()
+{
+#pragma omp parallel num_threads(8)
+  {
+#pragma omp for
+    for (int i = 0; i < 10; i++)
+#pragma omp task
+      {
+       full_data[omp_get_thread_num ()] += 1;
+      }
+#pragma omp barrier
+
+    unsigned total = 0;
+    for (int i = 0; i < NUM_THREADS; i++)
+      total += full_data[i];
+
+    if (total != 10)
+      abort ();
+  }
+}
+#pragma omp declare target enter(test)
+
+
+int
+main ()
+{
+  test ();
+
+  #pragma omp target
+    test ();
+}

Reply via email to