On 08/09/2023 10:04, Tobias Burnus wrote:

Regarding patch 2/3 and MEMSPACE_VALIDATE.

In general, I wonder how to handle memory spaces (and traits) that
aren't supported. Namely, when to return 0L and when to silently use
ignore the trait / use another memory space.

The current omp_init_allocator code only returns omp_null_allocator for
invalid value – or for pinned memory (as it is unsupported). [RFC: Shall
we keep doing so – or return omp_null_mem_alloc more often? →
https://gcc.gnu.org/PR111044 for this question, improving libmemkind
usage, and extending the allocator-related documentation.]

As we do it on the host, I think auto-fallback to omp_default_mem_space
is is also find for nvptx (and gcn), but not as done in 2/3 but slightly
different:

(a) In omp_init_allocator, there should be a check whether it is
supported, if not, we can fallback to using default memory space. (In
line with the current code host + 1/2+2/3 nvptx behaviour.)

Note: That's not the same as the current 2/3 patch. Currently, if
MEMSPACE_VALIDATE fails, a retry is attempted – but the outcome depends
on the value for 'fallback'. When changing the memory space during
omp_init_allocator, only failed 'malloc' will give abort with abort_fb.

(b) For nvptx_memspace_validate, I think an additional check should be
done based on the __PTX_ISA_VERSION* as it feels off if plugin first
claims support for it but later unconditionally uses malloc at runtime.

I have looked at moving the MEMSPACE_VALIDATE call into omp_init_allocator so that we can't even create allocators that would be invalid, but that changes the semantics of the fall-back traits. Here's the example from testcase omp_alloc-traits.c:

  omp_alloctrait_t traits_all[2]
    = { { omp_atk_fallback, omp_atv_null_fb },
        { omp_atk_access, omp_atv_all } };
  omp_allocator_handle_t lowlat_all
    = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);

  /* ... */

  void *b = omp_alloc (1, lowlat_all);

With my patch as proposed, "lowlat_all" is a valid allocator, but allocating low-latency memory fails in omp_alloc, so "b" ends up NULL (the fall-back setting).

With the proposed change, "lowlat_all" becomes omp_null_allocator, and "b" is non-NULL, pointing to default memory. This is probably surprising to the user because they thought they specified "low-latency or nothing".

Another option would be to create a custom allocator that goes straight to the fall-back somehow (we could invent an internal value "ompx_fallback_mem_space", or some such).

What is the desired behaviour in this case? I'm not sure that what the OpenMP spec actually says matches what the intention seems to have been with fallbacks.

(c) We also need to handle omp_low_lat_mem_alloc. I think the spec
implies access:all but nvptx/gcn only support cgroup (+ pteams +
thread), potentially leading to wrong code.
If we're not allowed to default to "cgroup" then surely omp_low_lat_mem_alloc is useless on all GPU devices (that I am aware of) on all toolchains? There may be some use on some specialist NUMA host devices, but that's it.

Andrew

Reply via email to