Hi Tobias!

On 2024-09-15T00:32:21+0200, Tobias Burnus <tbur...@baylibre.com> wrote:
> The idea of link variables is to replace he full device variable by a 
> pointer, permitting to map only parts of the variable to the device, 
> saving memory.
>
> However, having a pointer permits for (unified) shared memory to point 
> to the host variable.
>
> That's what this patch does: instead of having a dangling pointer, upon 
> loading the image, the device side pointers are updated to point to the 
> host. With the current patch, this is only done when explicitly 
> requesting unified-shared memory.
>
> Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).

(I yet have to set up such a USM configuration...)

> Remarks/comments/suggestions before I commit it?

> libgomp: with USM, init 'link' variables with host address
>
> If requires unified_shared_memory is set, make 'declare target link'
> variables to point initially to the host pointer.
>
> libgomp/ChangeLog:
>
>       * target.c (gomp_load_image_to_device): For requires
>       unified_shared_memory, update 'link' vars to point to the host var.
>       * testsuite/libgomp.c-c++-common/target-link-3.c: New test.
>
>  libgomp/target.c                                   |  5 +++
>  .../testsuite/libgomp.c-c++-common/target-link-3.c | 52 
> ++++++++++++++++++++++
>  2 files changed, 57 insertions(+)

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -2451,6 +2451,11 @@ gomp_load_image_to_device (struct gomp_device_descr 
> *devicep, unsigned version,
>        array->right = NULL;
>        splay_tree_insert (&devicep->mem_map, array);
>        array++;

Do I understand correctly that even if
'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the
'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not
(yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'?  (I've not yet worked
through the "libgomp: Enable USM for some nvptx devices" discussion from
earlier this year.)

> +
> +      if (is_link_var
> +       && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
> +     gomp_copy_host2dev (devicep, NULL, (void *) target_var->start,
> +                         &k->host_start, sizeof (void *), false, NULL);
>      }

Calling 'gomp_copy_host2dev' looks a bit funny given we've just
determined USM (..., but I'm not asking for plain 'memcpy').

There is nothing to un-do in 'gomp_unload_image_from_device', right?

What's the advantage/rationale of doing this here vs. in
'gomp_map_vars_internal' for 'REFCOUNT_LINK'?  (May be worth a source
code comment?)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
> @@ -0,0 +1,52 @@
> +/* { dg-do run }  */
> +
> +#include <stdint.h>
> +#include <omp.h>
> +
> +#pragma omp requires unified_shared_memory
> +

Intentionally mixing non-'static' vs. 'static' in the following?

> +int A[3] = {-3,-4,-5};
> +static int q = -401;
> +#pragma omp declare target link(A, q)
> +
> +#pragma omp begin declare target
> +void
> +f (uintptr_t *pA, uintptr_t *pq)
> +{
> +  if (A[0] != 1 || A[1] != 2 || A[2] != 3 || q != 42)
> +    __builtin_abort ();
> +  A[0] = 13;
> +  A[1] = 14;
> +  A[2] = 15;
> +  q = 23;
> +  *pA = (uintptr_t) &A[0];
> +  *pq = (uintptr_t) &q;
> +}
> +#pragma omp end declare target
> +
> +int
> +main ()
> +{
> +  uintptr_t hpA = (uintptr_t) &A[0];
> +  uintptr_t hpq = (uintptr_t) &q;
> +  uintptr_t dpA, dpq;
> +
> +  A[0] = 1;
> +  A[1] = 2;
> +  A[2] = 3;
> +  q = 42;
> +
> +  for (int i = 0; i <= omp_get_num_devices (); ++i)
> +    {
> +      #pragma omp target device(device_num: i) map(dpA, dpq)
> +     f (&dpA, &dpq);
> +      if (hpA != dpA || hpq != dpq)
> +     __builtin_abort ();
> +      if (A[0] != 13 || A[1] != 14 || A[2] != 15 || q != 23)
> +     __builtin_abort ();
> +      A[0] = 1;
> +      A[1] = 2;
> +      A[2] = 3;
> +      q = 42;
> +    }
> +}

Reply via email to