https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #14 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Regarding my previous report that after
commit r12-7332-g5ed77fb3ed1ee0289a0ec9499ef52b99b39421f1
"[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end"...

(In reply to Thomas Schwinge from comment #13)
> [...] on one system (only!), I'm [...] seeing regressions as follows:
> 
>     PASS: libgomp.c/../libgomp.c-c++-common/task-detach-10.c (test for excess 
> errors)
>     {+WARNING: program timed out.+}
>     [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/task-detach-10.c 
> execution test

..., and similar for all 'libgomp.c-c++-common/task-detach-10.c',
'libgomp.c-c++-common/task-detach-8.c', 'libgomp.fortran/task-detach-10.f90',
'libgomp.fortran/task-detach-8.f90' test cases:

> (Accumulated over a few runs; not always seeing all of those.)
> 
> That's with a Nvidia Tesla K20c GPU, Driver Version: 346.46.
> As that version is "a bit old", I shall first update this, before we spend
> any further time on analyzing this.

Cross-checking on another system with Nvidia Tesla K20c GPU but more recent
Driver Version I'm not seeing such an issue.

On the "old" system, gradually upgrading Driver Version: 346.46 to 352.99,
361.93.02, 375.88 (always the latest (?) version of the respective series),
these all did not resolve the problem.

Only starting with 384.59 (that is, early version of the 384.X series), that
then did resolve the issue.  That's still using the GCC/nvptx '-mptx=3.1'
multilib.

(We couldn't with earlier series, but given this is 384.X, we may now also
cross-check with the default multilib, and that also was fine.)

Now, I don't know if at all we would like to spend any more effort on this
issue, given that it only appears with rather old pre-384.X versions -- but on
the other hand, the GCC/nvptx '-mptx=3.1' multilib is meant to keep these
supported?  (... which is why I'm running such testing; and certainly the
timeouts are annoying there.)

It might be another issue with pre-384.X versions of the Nvidia PTX JIT, or is
there the slight possibility that GCC is generating/libgomp contains some
"weird" code that post-384.X version happen to "fix up" -- probably the former
rather than the latter?  (Or, the chance of GPU hardware/firmware or some other
system weirdness -- unlikely, otherwise behaves totally fine?)

I don't know where to find complete Nvidia Driver/JIT release notes, where the
375.X -> 384.X notes might provide an idea of what got fixed, and we might then
add another 'WORKAROUND_PTXJIT_BUG' for that -- maybe simple, maybe not.

Any thoughts, Tom?

Reply via email to