https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88941
--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> --- (In reply to Tom de Vries from comment #1) > At the acc_shutdown documentation, we read: > .. > - This routine may not be called during execution of an accelerator compute > region. > - If the program attempts to execute a compute region or access any device > data on such a device, the behavior is undefined. > ... > > The lib-82.c testcase launches kernels asynchronously (not by using parallel > async, but by using cuLaunchKernel). > > The documentation of acc_shutdown seems to imply that we need to wait for > all those streams finish before calling acc_shutdown. > > There is a wait call before acc_shutdown: > ... > acc_wait_all_async (0); > ... > but the semantics for that one is: > ... > The acc_wait_all_async routine enqueues wait operations on one async queue > for the operations previously enqueued on all other async queues. > ... > > ISTM that this can't guarantee that all queues have finished. > > OTOH, using acc_wait_all, with semantics: > ... > The acc_wait_all routine waits for completion of all asynchronous operations. > ... > seems to guarantee that, and indeed using this call instead fixed the > lib-82.c failure. Filed as PR88942 - "[openacc, testsuite] lib-82.c does not wait for all streams before calling acc_shutdown"