Re: [PATCH v7 1/2] libgomp, nvptx: Cuda pinned memory

Tobias Burnus Tue, 21 Oct 2025 04:21:08 -0700

Andrew Stubbs wrote in "[PATCH v7 0/2] libgomp: OpenMP pinned memory 
improvements":

Compared to the v6 patchset I sent last week, this series has had the
gomp_debug noise and some vestigial cruft removed, [...]


Andrew Stubbs wrote in this thread:

Use Cuda to pin memory, instead of Linux mlock, when available.

There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.

...

OK for mainline?

...

--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
-@item The @code{pinned} trait is supported on Linux hosts, but is subject to
-      the OS @code{ulimit}/@code{rlimit} locked memory settings.  It currently
-      uses @code{mmap} and is therefore optimized for few allocations, 
including
-      large data.  If the conditions for numa or memkind allocations are
-      fulfilled, those allocators are used instead.
+@item The @code{pinned} trait is supported on Linux hosts, but is usually
+      subject to the OS @code{ulimit}/@code{rlimit} locked memory settings (see
+      @ref{Offload-Target Specifics} for exceptions).  It currently uses
+      @code{mmap} and is therefore optimized for few allocations, including
+      large data.


I think I mentioned this before, but it looks as if it got lost
while removing large quoted code blocks in the reply email :-/

The patch removes the sentence:
"If the conditions for numa or memkind allocations are fulfilled,
 those allocators are used instead."

Given the current implementation, I think this sensence should
stay.  However, it seems as if the proper change is the
following.

(A) Looking at the current code, I wonder whether we need in
omp_init_allocator something like:

  /* Reject unsupported memory spaces.  */
#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
  if (data.pinned || data.memkind != GOMP_MEMKIND_NONE)
    return omp_null_allocator;
#end if
  if (!MEMSPACE_VALIDATE (data.memspace, data.access, data.pinned))
    return omp_null_allocator;

(B) Mentioning that (possibly not in this @item but elsewhere in the
section) that 'if requesting pinned memory and the conditions for
numa or memkind allocations are fulfilled', the creation of the
user allocator will fail.

[Or, alternatively, changing the priority: Using the pinned-memory
allocator, resetting the use of memkind – under the assumption that
pinning is more important than using memkind/numa - as it might be
detectable as a user.]

As the more proper changes still imply that the removed sentence
should be removed: I think this patch is fine, but we really should
do the follow-up fix.

* * *

--- a/libgomp/target.c
+++ b/libgomp/target.c

...

+static struct gomp_device_descr *
+get_device_for_page_locked (void)
+{

...

+  if (device == (void *) -1)
+    {

...

+         if (device)
+           gomp_fatal ("Unclear how %s and %s libgomp plugins may"
+                       " simultaneously provide functionality"
+                       " for page-locked memory",
+                       device->name, devices[i].name);
+         else
+           device = &devices[i];
+       }

* * *

+gomp_page_locked_host_alloc (void **ptr, size_t size)
+{
+  struct gomp_device_descr *device = get_device_for_page_locked ();
+  if (device)
+    {
+      gomp_mutex_lock (&device->lock);
+      if (device->state == GOMP_DEVICE_UNINITIALIZED)
+       {
+         gomp_init_device (device);
+         gomp_debug (0, "Using device %s for page-locked memory\n",
+                     device->name);


Wouldn't make more sense to move this diagnostic up to
after 'device =' in get_device_for_page_locked (as quoted above,
above the '* * *' line)?

In both locations, it should print once – but here only if
the device wasn't initialized before, in the other case at
the first time device-for-page-lock is checked.

Having this output also after any device-initializing function
call seems to make more sense.

(BTW: 'gomp_fatal' calls 'exit', the 'else' is not really
needed.)

* * *

LGTM – as is or with the suggestion above applied, but I think
with the moved diagnostic it is nicer.

Thanks for the patch!

Tobias

PS: I would be happy if you could look into the numa/memkind
vs. pinned issue, as follow up.

Re: [PATCH v7 1/2] libgomp, nvptx: Cuda pinned memory

Reply via email to