Hi Chung-Lin,

some first throughts + remarks, no proper review yet,
but I want to make sure that I don't loose those thoughts
& remarks.

Chung-Lin Tang wrote:

This is a new updated patch for OpenMP uses_allocators support.

First, as meanwhile libgomp/config/nvptx/allocator.c was touched,
you have to trivially rediff it.

However, you also want to update the arguments of
GOMP_OFFLOAD_memspace_validate to match the normal memspace_validate.

I think you also want to add a comment to both the two
GOMP_OFFLOAD_memspace_validate and to the memspace_validate
in config/*/allocator.c referring to another and noting that those
should be kept in sync.

* * *

The following seems to be manually added code – and it break bootstrap's
stage1 – and presumably building without offloading configured:

--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15744,11 +15744,13 @@ if test x"$enable_offload_targets" != x; then
      fi
    done
  fi
+if test x"$offload_plugins" != x; then
cat >>confdefs.h <<_ACEOF
  #define OFFLOAD_PLUGINS "$offload_plugins"
  _ACEOF
+fi
   if test $PLUGIN_NVPTX = 1; then
    PLUGIN_NVPTX_TRUE=
    PLUGIN_NVPTX_FALSE='#'

* * *


2. The target teams issue has been solved by a host-side implementation:
    omp_init_allocator() is called on host side, and mapped to target,
    with the allocator passed in as a firstprivate variable.
    Some adjustments where made to ensure that host/device side must have
    same omp_allocator_data format.

This is based on the discussion in OpenMP specification Issue #4398.
This lead to some wording changes that made it into TR14 (OpenMP 6.1
preview), cf. "13.6 allocate Clause" and "Restrictions to the target construct
are as follows".

However, the latter applies to using the 'allocate' clause on 'target',
cf. related patch
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/687685.html
+https://gcc.gnu.org/pipermail/gcc-patches/2025-July/691180.html

* * *

Note that several new tests need my recently submitted testsuite patch to test 
correctly:

… or, alternatively, need to move to libgomp.
(The issue is that 'omp.h' is not currently available when
testing gcc/testsuite, only for libgomp/testsuite/ tests.)

* * *

[FYI – no action required for the following two items:]

Side remark: When testing the patch, I run into the following
(preexisting, somewhat related) GCC bug:
PR122728 - [OpenMP] Bogus warning "allocator with access trait set to
           ‘thread’ results in undefined behavior for ‘target’ directive"
(Related: PR111042)
PR122748- [OpenMP] ICE segfault with invalid 'allocate' clause in c_parser_omp_clause_allocate at c-parser.cc:19348
On the other hand, I have now filed the OpenMP_VV issue
"test_omp_target_aligned_alloc_device.c violates "traits-array modifier
must be a constant array" restriction" https://github.com/OpenMP-Validation-and-Verification/OpenMP_VV/issues/892 (+ added a pull request) * * * I think we should mark the trait as used: const omp_alloctrait_t traits[] = {{omp_atk_alignment, 64}}; omp_allocator_handle_t alloc = omp_null_allocator; #pragma omp target map(tofrom: errors) uses_allocators(alloc(traits)) → warning: variable ‘traits’ set but not used [-Wunused-but-set-variable=] And presumably likewise for the memspace, if passed as variable (cf. tests/5.1/allocate/test_omp_target_aligned_alloc_device.c albeit there the omp_init_allocator hides the diagnostic + memspace is not used in uses_allocators.) * * * The OpenMP 5.2 specification has in "B.1 Deprecated Features":

"The argument that specified the arguments of the uses_allocators
 clause as a comma-separated list in which each list item is a
 clause-argument-specification of the form allocator[(traits)]
 was deprecated."

Assuming the version bump + deprecation patch lands first, we
a warning_at/gfc_warning for OPT_Wdeprecated_openmp is required
for C/C++/Fortran and updating the
"Deprecation of traits array following the allocator_handle
expression in|uses_allocators" entry in libgomp.texi. However, this can also be done as follow-up, esp. if the deprecation patch has not yet landed. * * * Documentation: Once the patch lands, the following entries can be marked as "Y" in libgomp/libgomp.texi: 5.2: "|New|memspace| and|traits| modifiers for|uses_allocators"| |[For this + deprecation item: current page: https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-5_002e2.html ] |
* * *

--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -365,6 +365,17 @@ CFLAGS="$save_CFLAGS $XCFLAGS"
  # had a chance to set XCFLAGS.
  LIBGOMP_CHECK_SYNC_BUILTINS

+if test x$libgomp_cv_have_sync_builtins = xno; then
+  # We require accelerator targets to support __sync_* builtins.
+  if test x$libgomp_offloaded_only = xyes; then
+    AC_MSG_ERROR([accelerator targets require __sync_val_compare_and_swap to 
build libgomp.])
+  fi
+  # Same for offload hosts.
+  if test x"$offload_plugins" = x; then
+    AC_MSG_ERROR([offload hosts require __sync_val_compare_and_swap to build 
libgomp.])
+  fi
+fi

The host check looks wrong …

Additionally, I think it is cleaner to use:

+++ b/libgomp/allocator.c
@@ -218,14 +218,24 @@ struct omp_allocator_data
...
+  /* To unify the format of this type across host/accelerator, enable
+     this field unconditionally when offload is enabled.  */
+  #if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA) ||  \
+    defined(OFFLOAD_PLUGINS) || defined(LIBGOMP_OFFLOADED_ONLY)
    unsigned int memkind : 8;
  #endif
First, here the indentation is wrong: #if and #endif should be indented
by the same amount.

+  /* Note: we now require __sync builtins for offload host/accelerator,
+     checked during configuration. This lock should never be enabled
+     for offload configs.  */
  #ifndef HAVE_SYNC_BUILTINS
    gomp_mutex_t lock;
  #endif
  };

And here, I wonder whether we should just do:

/* With offloading, we require that both the host and the device have
   __sync builtins as uses_allocators initializes the device allocator
   on the host.  */
#if (defined(OFFLOAD_PLUGINS) || defined(LIBGOMP_OFFLOADED_ONLY)) \
    && !defined(HAVE_SYNC_BUILTINS))
 #error "...."
#endif

Because this makes immediately clear why this is required. Having it just
in configure(.ac,) does not tell why it is needed.

Or is there a real advantage of using it?

* * *


The following code produces an omp_null_allocator only
if there is no unified-shared memory, which feels odd:


#include <omp.h>
#include <stdint.h>

#ifdef VARIANT
  #pragma omp requires self_maps
#endif

int main() {
  const omp_alloctrait_t traits[] = {{omp_atk_pinned, omp_atv_true}, 
{omp_atk_access, omp_atv_all}};
  omp_allocator_handle_t alloc = omp_null_allocator;

  #pragma omp target uses_allocators(memspace(omp_low_lat_mem_space), 
traits(traits) : alloc)
  {
     __builtin_printf ("%p\n", (void*) alloc);
     int *ptr = (int*)omp_alloc(sizeof(int)*5, alloc);
     ptr[4] = 5;
     omp_free (ptr, omp_null_allocator);
  }
}


The reason for this is that the normal host omp_init_allocator
is called – and only for one the GOMP_omp_allocator_map does the
check.

* * *

+omp_allocator_handle_t
+GOMP_omp_allocator_map (omp_allocator_handle_t host_handle)
+{
+  if (!host_handle)
+    return host_handle;
+  struct gomp_device_descr *devicep = resolve_device (-1, true);

This does not handle:
  omp target device(my_devnum) uses_allocators(...)

* * *

As mentioned a couple of times - and being ignored - I think you should
use something like:

size_t len;
alloc_ = GOMP_init_allocator (devnum, memspace, n_traits, traits, &len);

...
omp target map(to: *alloc_ [len: len]) firstprivate(alloc + pointer attach)

This way, the memory transfers are properly handled.


Your current version has a race:

#include <math.h>
#include <omp.h>

int main() {
  const omp_alloctrait_t traits[] = {{omp_atk_alignment, 1024}};
  omp_allocator_handle_t alloc = omp_null_allocator;

  #pragma omp target uses_allocators(traits(traits) : alloc) nowait
  {
     __builtin_printf ("%p\n", (void*) alloc);
     double x;
     for (int i = 0; i < 100000; i++)
       x = sin((double)i);
     int *ptr = (int*)omp_alloc(sizeof(int), alloc);
     *ptr = 5;
     omp_free (ptr, omp_null_allocator);
  }
}


Namely:

      #pragma omp target num_teams(-2) thread_limit(0) nowait ...
        {
...
          D.5236 = .omp_data_i->alloc;
          D.5237 = (omp_allocator_handle_t) D.5236;
          alloc = D.5237;
...
            ptr = omp_alloc (4, alloc);
...
        }
      .omp_data_arr.4 = {CLOBBER};
    }
    __builtin_GOMP_omp_allocator_unmap (host_allocator.1);
    __builtin_omp_destroy_allocator (host_allocator.1);

As target runs asynchronously, the 'release' of the device memory happens 
before.

Hmm, actually, there is a general issue – the 'omp_destroy_allocator'
happens too early :-(

Tobias

Reply via email to