Hi Tobias! On 2024-09-15T00:32:21+0200, Tobias Burnus <tbur...@baylibre.com> wrote: > The idea of link variables is to replace he full device variable by a > pointer, permitting to map only parts of the variable to the device, > saving memory. > > However, having a pointer permits for (unified) shared memory to point > to the host variable. > > That's what this patch does: instead of having a dangling pointer, upon > loading the image, the device side pointers are updated to point to the > host. With the current patch, this is only done when explicitly > requesting unified-shared memory. > > Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).
(I yet have to set up such a USM configuration...) > Remarks/comments/suggestions before I commit it? > libgomp: with USM, init 'link' variables with host address > > If requires unified_shared_memory is set, make 'declare target link' > variables to point initially to the host pointer. > > libgomp/ChangeLog: > > * target.c (gomp_load_image_to_device): For requires > unified_shared_memory, update 'link' vars to point to the host var. > * testsuite/libgomp.c-c++-common/target-link-3.c: New test. > > libgomp/target.c | 5 +++ > .../testsuite/libgomp.c-c++-common/target-link-3.c | 52 > ++++++++++++++++++++++ > 2 files changed, 57 insertions(+) > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -2451,6 +2451,11 @@ gomp_load_image_to_device (struct gomp_device_descr > *devicep, unsigned version, > array->right = NULL; > splay_tree_insert (&devicep->mem_map, array); > array++; Do I understand correctly that even if 'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the 'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not (yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'? (I've not yet worked through the "libgomp: Enable USM for some nvptx devices" discussion from earlier this year.) > + > + if (is_link_var > + && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)) > + gomp_copy_host2dev (devicep, NULL, (void *) target_var->start, > + &k->host_start, sizeof (void *), false, NULL); > } Calling 'gomp_copy_host2dev' looks a bit funny given we've just determined USM (..., but I'm not asking for plain 'memcpy'). There is nothing to un-do in 'gomp_unload_image_from_device', right? What's the advantage/rationale of doing this here vs. in 'gomp_map_vars_internal' for 'REFCOUNT_LINK'? (May be worth a source code comment?) > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c > @@ -0,0 +1,52 @@ > +/* { dg-do run } */ > + > +#include <stdint.h> > +#include <omp.h> > + > +#pragma omp requires unified_shared_memory > + Intentionally mixing non-'static' vs. 'static' in the following? > +int A[3] = {-3,-4,-5}; > +static int q = -401; > +#pragma omp declare target link(A, q) > + > +#pragma omp begin declare target > +void > +f (uintptr_t *pA, uintptr_t *pq) > +{ > + if (A[0] != 1 || A[1] != 2 || A[2] != 3 || q != 42) > + __builtin_abort (); > + A[0] = 13; > + A[1] = 14; > + A[2] = 15; > + q = 23; > + *pA = (uintptr_t) &A[0]; > + *pq = (uintptr_t) &q; > +} > +#pragma omp end declare target > + > +int > +main () > +{ > + uintptr_t hpA = (uintptr_t) &A[0]; > + uintptr_t hpq = (uintptr_t) &q; > + uintptr_t dpA, dpq; > + > + A[0] = 1; > + A[1] = 2; > + A[2] = 3; > + q = 42; > + > + for (int i = 0; i <= omp_get_num_devices (); ++i) > + { > + #pragma omp target device(device_num: i) map(dpA, dpq) > + f (&dpA, &dpq); > + if (hpA != dpA || hpq != dpq) > + __builtin_abort (); > + if (A[0] != 13 || A[1] != 14 || A[2] != 15 || q != 23) > + __builtin_abort (); > + A[0] = 1; > + A[1] = 2; > + A[2] = 3; > + q = 42; > + } > +}