https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
Current theory ...

All omp-threads are supposed to participate in a team barrier, and then all
together move on.  The master omp-thread participates from gomp_team_end, the
other omp-threads from the worker loop in gomp_thread_start.

Instead, it seems the master omp-thread gets stuck at the team barrier, while
all other omp-threads move on, to the thread pool barrier, and that state
corresponds to the observed hang.

AFAICT, the problem starts when gomp_team_barrier_wake is called with count ==
1:
...
void
gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
{
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
}
...
The count argument is ignored, and instead all omp-threads are woken up, which
causes omp-threads to escape the team barrier.

This all is a result of the gomp_barrier_handle_tasks path being taken in
gomp_team_barrier_wait_end, and I haven't figured out why that is triggered, so
it still may be that the root cause lies elsewhere.

Anyway, the nvptx bar.{c,h} is copied from linux/bar.{c,h}, which is
implemented using futex, and with futex uses replaced with bar.sync uses.

FWIW, replacing libgomp/config/nvptx/bar.{c,h} with libgomp/config/posix.{c,h}
fixes the problem.  Did a full libgomp test run, all problems fixed.

Reply via email to