Loops lacking exit edges can trigger an NVIDIA driver sm_50 code generation
bug, which manifested as stack pointer (SASS register R1) corruption in this
case. Adjusting source by hand to arrange a cheap exit branch seems to be the
most reasonable workaround. NVIDIA bug ID 200177879.
* config/nvptx/team.c (gomp_thread_start): Work around NVIDIA driver
bug by adding an exit edge to the loop,
---
libgomp/ChangeLog.gomp-nvptx | 5 +++++
libgomp/config/nvptx/team.c | 6 +++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 933f5a0..0291539 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -84,7 +84,7 @@ gomp_thread_start (struct gomp_thread_pool *pool)
gomp_sem_init (&thr->release, 0);
thr->thread_pool = pool;
- for (;;)
+ do
{
gomp_simple_barrier_wait (&pool->threads_dock);
if (!thr->fn)
@@ -96,6 +96,10 @@ gomp_thread_start (struct gomp_thread_pool *pool)
gomp_team_barrier_wait_final (&thr->ts.team->barrier);
gomp_finish_task (task);
}
+ /* Work around an NVIDIA driver bug: when generating sm_50 machine code,
+ it can trash stack pointer R1 in loops lacking exit edges. Add a cheap
+ artificial exit that the driver would not be able to optimize out. */
+ while (nvptx_thrs);
}
/* Launch a team. */