On 21/10/2025 12:20, Tobias Burnus wrote:
Andrew Stubbs wrote in "[PATCH v7 0/2] libgomp: OpenMP pinned memory improvements":
Compared to the v6 patchset I sent last week, this series has had the
gomp_debug noise and some vestigial cruft removed, [...]

Andrew Stubbs wrote in this thread:

Use Cuda to pin memory, instead of Linux mlock, when available.

There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.
...
OK for mainline?
...
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
-@item The @code{pinned} trait is supported on Linux hosts, but is subject to -      the OS @code{ulimit}/@code{rlimit} locked memory settings.  It currently -      uses @code{mmap} and is therefore optimized for few allocations, including
-      large data.  If the conditions for numa or memkind allocations are
-      fulfilled, those allocators are used instead.
+@item The @code{pinned} trait is supported on Linux hosts, but is usually +      subject to the OS @code{ulimit}/@code{rlimit} locked memory settings (see
+      @ref{Offload-Target Specifics} for exceptions).  It currently uses
+      @code{mmap} and is therefore optimized for few allocations, including
+      large data.

I think I mentioned this before, but it looks as if it got lost
while removing large quoted code blocks in the reply email :-/

The patch removes the sentence:
"If the conditions for numa or memkind allocations are fulfilled,
  those allocators are used instead."

Given the current implementation, I think this sensence should
stay.  However, it seems as if the proper change is the
following.

(A) Looking at the current code, I wonder whether we need in
omp_init_allocator something like:

   /* Reject unsupported memory spaces.  */
#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
   if (data.pinned || data.memkind != GOMP_MEMKIND_NONE)
     return omp_null_allocator;
#end if
   if (!MEMSPACE_VALIDATE (data.memspace, data.access, data.pinned))
     return omp_null_allocator;

(B) Mentioning that (possibly not in this @item but elsewhere in the
section) that 'if requesting pinned memory and the conditions for
numa or memkind allocations are fulfilled', the creation of the
user allocator will fail.

[Or, alternatively, changing the priority: Using the pinned-memory
allocator, resetting the use of memkind – under the assumption that
pinning is more important than using memkind/numa - as it might be
detectable as a user.]

As the more proper changes still imply that the removed sentence
should be removed: I think this patch is fine, but we really should
do the follow-up fix.

The sentence was removed because it's not about the "pinned" trait. Memkind can be used for the high-bandwidth and large-capacity memory spaces, but this part of the manual is talking about pinned memory.

The patch probably does make sense though, although I don't like that supporting libmemkind makes some allocators invalid that would normally work fine.

* * *


--- a/libgomp/target.c
+++ b/libgomp/target.c
...
+static struct gomp_device_descr *
+get_device_for_page_locked (void)
+{
...
+  if (device == (void *) -1)
+    {
...
+      if (device)
+        gomp_fatal ("Unclear how %s and %s libgomp plugins may"
+            " simultaneously provide functionality"
+            " for page-locked memory",
+            device->name, devices[i].name);
+      else
+        device = &devices[i];
+    }
* * *
+gomp_page_locked_host_alloc (void **ptr, size_t size)
+{
+  struct gomp_device_descr *device = get_device_for_page_locked ();
+  if (device)
+    {
+      gomp_mutex_lock (&device->lock);
+      if (device->state == GOMP_DEVICE_UNINITIALIZED)
+    {
+      gomp_init_device (device);
+      gomp_debug (0, "Using device %s for page-locked memory\n",
+              device->name);

Wouldn't make more sense to move this diagnostic up to
after 'device =' in get_device_for_page_locked (as quoted above,
above the '* * *' line)?

In both locations, it should print once – but here only if
the device wasn't initialized before, in the other case at
the first time device-for-page-lock is checked.

Having this output also after any device-initializing function
call seems to make more sense.

(BTW: 'gomp_fatal' calls 'exit', the 'else' is not really
needed.)

OK, I think it makes sense to always have this message in the log, without depending on the initialization order.

* * *

LGTM – as is or with the suggestion above applied, but I think
with the moved diagnostic it is nicer.

Thanks for the patch!

I will make the diagnostic change, rebase, retest and then commit both patches.

Andrew

Reply via email to