On 8/20/20 3:03 PM, Chung-Lin Tang wrote: > Hi Tom, > this patch adjusts nvptx_free() in libgomp/plugin/plugin-nvptx.c to avoid a > "GOMP_PLUGIN_acc_thread() == NULL" check that was causing problems under > OpenMP offloading. > > This check was originally used to determine if nvptx_free() was running > under > CUDA callback context, when freeing resources from an OpenACC > asynchronous compute > region. Since CUDA API calls are not allowed inside callback context, we > have > to save the freed block to ptx_dev->free_blocks, and cuMemFree it later. > > The check to see if GOMP_PLUGIN_acc_thread() exists to determine normal > host thread > vs. callback thread worked under -fopenacc, but since the OpenACC > per-thread data > is not created under -fopenmp, and always returned NULL, we have a leak > situation > where OpenMP offloading kept accumulating freed device memory blocks > until failing; > nvptx_free() never reaches the part where cuMemFree() is actually called. > > I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is > returned > for such CUDA calls inside callback context,
Right, that's "Callbacks must not make any CUDA API calls. Attempting to use a CUDA API will result in CUDA_ERROR_NOT_PERMITTED" at cuStreamAddCallback. Perhaps mention more precisely where you found this. > and it appears to be enough > to replace > the current check, so the new code sees if this error is returned on the > first > cuMemGetAddressRange call to determine callback context. This should now > work > for both OpenACC/OpenMP. > > (Tobias, Catherine, the earlier internal patch to re-organize this > callback context > checking did not work in general, since OpenACC also uses the > .queue_callback > functionality to free the struct target_mem_desc asynchronously, so in > general we > have to ensure nvptx_free() could be used under both normal/callback > context) > > This patch has been libgomp tested for x86_64-linux with nvptx > offloading without > regressions, and should be applied for mainline and GCC10. Is this okay? > Ok, thanks for fixing this. - Tom > Thanks, > Chung-Lin > > 2020-08-20 Chung-Lin Tang <clt...@codesourcery.com> > > libgomp/ > * plugin/plugin-nvptx.c (nvptx_free): Change "GOMP_PLUGIN_acc_thread > () == NULL" > test into check of CUDA_ERROR_NOT_PERMITTED status for > cuMemGetAddressRange. > Adjust comments. > diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c > index ec103a2f40b..188a34f1d04 100644 > --- a/libgomp/plugin/plugin-nvptx.c > +++ b/libgomp/plugin/plugin-nvptx.c > @@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread > *thr, void *p) > } > > static bool > nvptx_free (void *p, struct ptx_device *ptx_dev) > { > - /* Assume callback context if this is null. */ > - if (GOMP_PLUGIN_acc_thread () == NULL) > + CUdeviceptr pb; > + size_t ps; > + > + CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, &pb, &ps, > + (CUdeviceptr) p); > + if (r == CUDA_ERROR_NOT_PERMITTED) > { > + /* We assume that this error indicates we are in a CUDA callback > context, > + where all CUDA calls are not allowed. Arrange to free this piece of > + device memory later. */ > struct ptx_free_block *n > = GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block)); > n->ptr = p; > pthread_mutex_lock (&ptx_dev->free_blocks_lock); > n->next = ptx_dev->free_blocks; > ptx_dev->free_blocks = n; > pthread_mutex_unlock (&ptx_dev->free_blocks_lock); > return true; > } > - > - CUdeviceptr pb; > - size_t ps; > - > - CUDA_CALL (cuMemGetAddressRange, &pb, &ps, (CUdeviceptr) p); > + else if (r != CUDA_SUCCESS) > + { > + GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r)); > + return false; > + } > if ((CUdeviceptr) p != pb) > { > GOMP_PLUGIN_error ("invalid device address"); > return false; > }