On 8/20/20 3:03 PM, Chung-Lin Tang wrote:
> Hi Tom,
> this patch adjusts nvptx_free() in libgomp/plugin/plugin-nvptx.c to avoid a
> "GOMP_PLUGIN_acc_thread() == NULL" check that was causing problems under
> OpenMP offloading.
> 
> This check was originally used to determine if nvptx_free() was running
> under
> CUDA callback context, when freeing resources from an OpenACC
> asynchronous compute
> region. Since CUDA API calls are not allowed inside callback context, we
> have
> to save the freed block to ptx_dev->free_blocks, and cuMemFree it later.
> 
> The check to see if GOMP_PLUGIN_acc_thread() exists to determine normal
> host thread
> vs. callback thread worked under -fopenacc, but since the OpenACC
> per-thread data
> is not created under -fopenmp, and always returned NULL, we have a leak
> situation
> where OpenMP offloading kept accumulating freed device memory blocks
> until failing;
> nvptx_free() never reaches the part where cuMemFree() is actually called.
> 
> I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is
> returned
> for such CUDA calls inside callback context,

Right, that's "Callbacks must not make any CUDA API calls. Attempting to
use a CUDA API will result in CUDA_ERROR_NOT_PERMITTED" at
cuStreamAddCallback.  Perhaps mention more precisely where you found this.

> and it appears to be enough
> to replace
> the current check, so the new code sees if this error is returned on the
> first
> cuMemGetAddressRange call to determine callback context. This should now
> work
> for both OpenACC/OpenMP.
> 
> (Tobias, Catherine, the earlier internal patch to re-organize this
> callback context
> checking did not work in general, since OpenACC also uses the
> .queue_callback
> functionality to free the struct target_mem_desc asynchronously, so in
> general we
> have to ensure nvptx_free() could be used under both normal/callback
> context)
> 
> This patch has been libgomp tested for x86_64-linux with nvptx
> offloading without
> regressions, and should be applied for mainline and GCC10. Is this okay?
> 

Ok, thanks for fixing this.

- Tom

> Thanks,
> Chung-Lin
> 
> 2020-08-20  Chung-Lin Tang  <clt...@codesourcery.com>
> 
>     libgomp/
>     * plugin/plugin-nvptx.c (nvptx_free): Change "GOMP_PLUGIN_acc_thread
> () == NULL"
>     test into check of CUDA_ERROR_NOT_PERMITTED status for
> cuMemGetAddressRange.
>     Adjust comments.


> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
> index ec103a2f40b..188a34f1d04 100644
> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread 
> *thr, void *p)
>  }
>  
>  static bool
>  nvptx_free (void *p, struct ptx_device *ptx_dev)
>  {
> -  /* Assume callback context if this is null.  */
> -  if (GOMP_PLUGIN_acc_thread () == NULL)
> +  CUdeviceptr pb;
> +  size_t ps;
> +
> +  CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, &pb, &ps,
> +                               (CUdeviceptr) p);
> +  if (r == CUDA_ERROR_NOT_PERMITTED)
>      {
> +      /* We assume that this error indicates we are in a CUDA callback 
> context,
> +      where all CUDA calls are not allowed. Arrange to free this piece of
> +      device memory later.  */
>        struct ptx_free_block *n
>       = GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block));
>        n->ptr = p;
>        pthread_mutex_lock (&ptx_dev->free_blocks_lock);
>        n->next = ptx_dev->free_blocks;
>        ptx_dev->free_blocks = n;
>        pthread_mutex_unlock (&ptx_dev->free_blocks_lock);
>        return true;
>      }
> -
> -  CUdeviceptr pb;
> -  size_t ps;
> -
> -  CUDA_CALL (cuMemGetAddressRange, &pb, &ps, (CUdeviceptr) p);
> +  else if (r != CUDA_SUCCESS)
> +    {
> +      GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r));
> +      return false;
> +    }
>    if ((CUdeviceptr) p != pb)
>      {
>        GOMP_PLUGIN_error ("invalid device address");
>        return false;
>      }

Reply via email to