Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async

2022-05-05 Thread Jakub Jelinek via Fortran
On Mon, Feb 21, 2022 at 12:19:20PM +0100, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
>   * omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
>   target_memcpy_rect_async to omp_runtime_apis array.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.map: Added omp_target_memcpy_async and
>   omp_target_memcpy_rect_async.
>   * libgomp.texi: Both functions are now supported.
>   * omp.h.in: Added omp_target_memcpy_async and
>   omp_target_memcpy_rect_async.
>   * omp_lib.f90.in: Added interfaces for both new functions.
>   * omp_lib.h.in: Likewise.
>   * target.c (omp_target_memcpy): Restructured into check and copy part.
>   (omp_target_memcpy_check): New helper function for omp_target_memcpy and
>   omp_target_memcpy_async that checks requirements.
>   (omp_target_memcpy_copy): New helper function for omp_target_memcpy and
>   omp_target_memcpy_async that performs the memcpy.
>   (omp_target_memcpy_async_helper): New helper function that is used in
>   omp_target_memcpy_async for the asynchronous task.
>   (omp_target_memcpy_async): Added.
>   (omp_target_memcpy_rect): Restructured into check and copy part.
>   (omp_target_memcpy_rect_check): New helper function for
>   omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
>   requirements.
>   (omp_target_memcpy_rect_copy): New helper function for
>   omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
>   the memcpy.
>   (omp_target_memcpy_rect_async_helper): New helper function that is used
>   in omp_target_memcpy_rect_async for the asynchronous task.
>   (omp_target_memcpy_rect_async): Added.
>   * testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
>   * testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
>   * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
>   * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
>   * testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
>   * testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
>   * testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
>   * testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
> 
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -224,6 +224,8 @@ OMP_5.1 {
>   omp_set_teams_thread_limit_8_;
>   omp_get_teams_thread_limit;
>   omp_get_teams_thread_limit_;
> + omp_target_memcpy_async;
> + omp_target_memcpy_rect_async;
>  } OMP_5.0.2;

These should be added to OMP_5.1.1, not here.

> --- a/libgomp/omp.h.in
> +++ b/libgomp/omp.h.in
> @@ -272,6 +272,10 @@ extern int omp_target_is_present (const void *, int) 
> __GOMP_NOTHROW;
>  extern int omp_target_memcpy (void *, const void *, __SIZE_TYPE__,
> __SIZE_TYPE__, __SIZE_TYPE__, int, int)
>__GOMP_NOTHROW;
> +extern int omp_target_memcpy_async (void *, const void *, __SIZE_TYPE__,
> + __SIZE_TYPE__, __SIZE_TYPE__, int, int,
> + int, omp_depend_t*)

Formatting, space before *.

> +  __GOMP_NOTHROW;
>  extern int omp_target_memcpy_rect (void *, const void *, __SIZE_TYPE__, int,
>  const __SIZE_TYPE__ *,
>  const __SIZE_TYPE__ *,
> @@ -279,6 +283,14 @@ extern int omp_target_memcpy_rect (void *, const void *, 
> __SIZE_TYPE__, int,
>  const __SIZE_TYPE__ *,
>  const __SIZE_TYPE__ *, int, int)
>__GOMP_NOTHROW;
> +extern int omp_target_memcpy_rect_async (void *, const void *, __SIZE_TYPE__,
> +  int, const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *,
> +  const __SIZE_TYPE__ *, int, int, int,
> +  omp_depend_t*)

Likewise.

> -int
> -omp_target_memcpy (void *dst, const void *src, size_t length,
> -size_t dst_offset, size_t src_offset, int dst_device_num,
> -int src_device_num)
> +static int
> +omp_target_memcpy_check (void *dst, const void *src, int dst_device_num,
> +  int src_device_num,
> +  struct gomp_device_descr **dst_devicep,
> +  struct gomp_device_descr **src_devicep)
>  {

Why does omp_target_memcpy_check need the dst and src arguments?  From what
I can see, they aren't used by it.

> +typedef struct
> +{
> +  void *dst;
> +  const void *src;
> +  size_t length;
> +  size_t dst_offset;
> +  size_t src_offset;
> +  struct gomp_device_descr *dst_devicep;
> +  struct gomp_device_descr *src_devicep;
> +} memcpy_t;

Please come up with s

Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

2022-05-05 Thread Jakub Jelinek via Fortran
On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote:
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -226,6 +226,11 @@ OMP_5.1 {
>   omp_get_teams_thread_limit_;
>  } OMP_5.0.2;
>  
> +OMP_5.1.1 {
> +  global:
> + omp_target_is_accessible;
> +} OMP_5.1;
> +

You've already added another OMP_5.1.1 symbol, so this hunk will need to be
adjusted.  Keep the names in there alphabetically sorted.

> --- a/libgomp/omp_lib.f90.in
> +++ b/libgomp/omp_lib.f90.in
> @@ -835,6 +835,16 @@
>end function omp_target_disassociate_ptr
>  end interface
>  
> +interface
> +  function omp_target_is_accessible (ptr, size, device_num) bind(c)
> +use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int
> +integer(c_int) :: omp_target_is_accessible

The function returning integer(c_int) rather than logical seems like
a screw up in the standard, but too late to fix that :(.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -3666,6 +3666,24 @@ omp_target_disassociate_ptr (const void *ptr, int 
> device_num)
>  }
>  
>  int
> +omp_target_is_accessible (const void *ptr, size_t size, int device_num)
> +{
> +  if (device_num < 0 || device_num > gomp_get_num_devices ())
> +return false;
> +
> +  if (device_num == gomp_get_num_devices ())
> +return true;
> +
> +  struct gomp_device_descr *devicep = resolve_device (device_num);
> +  if (devicep == NULL)
> +return false;
> +
> +  /* TODO: Unified shared memory must be handled when available.  */
> +
> +  return devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM;

I guess for now it is reasonable, but I wonder if even without
GOMP_OFFLOAD_CAP_SHARED_MEM one can't for CUDA or GCN allocate host
memory (not all, but just some subset) that will be accessible on the
device (I bet that means accessible through the same address on the host and
device, aka partial shared mem).

So, ok for trunk.

OT, tried to look how libomptarget implements it and they don't at least
on llvm-project trunk, but while looking at that, noticed that for
omp_target_is_present they do return false from omp_target_is_present
while we return true.  It is unclear if NULL has corresponding storage
on the device (NULL always corresponds to NULL on the device) or not.

Jakub



Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

2022-05-05 Thread Tobias Burnus

Hi,

On 05.05.22 11:33, Jakub Jelinek via Gcc-patches wrote:

On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote:

+interface
+  function omp_target_is_accessible (ptr, size, device_num) bind(c)
+use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int
+integer(c_int) :: omp_target_is_accessible

The function returning integer(c_int) rather than logical seems like
a screw up in the standard, but too late to fix that :(.


I think the idea is that it can directly call the C function without
needing a wrapper. And as default-kind 'logical' != 'integer(c_int)' in
general, it cannot return logical. (In case of GCC, just claiming that
it is logical would work. But some Fortran compilers use -1 for .true.
and only flip a single bit for .not. For those,
"if(.not.omp_target_is_accessible(..)) will not work properly, if the C
function returns 1.

But I concur that requiring "/= 0" is ugly!


OT, tried to look how libomptarget implements it and they don't at least
on llvm-project trunk, but while looking at that, noticed that for
omp_target_is_present they do return false from omp_target_is_present
while we return true.  It is unclear if NULL has corresponding storage
on the device (NULL always corresponds to NULL on the device) or not.


Regarding NULL: no idea what's the best semantic – we could ask
for clarification.

Regarding target:
I think "false" from on device makes more sense in general, especially
if the device number points to a different device. It might work in some
cases – but false simply plays save. Note that the spec states:

"When called from within a target region the effect is unspecified."

Thus, either behavior is fine.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.

2022-05-05 Thread Jakub Jelinek via Fortran
On Thu, May 05, 2022 at 11:45:19AM +0200, Tobias Burnus wrote:
> > On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote:
> > > +interface
> > > +  function omp_target_is_accessible (ptr, size, device_num) 
> > > bind(c)
> > > +use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, 
> > > c_int
> > > +integer(c_int) :: omp_target_is_accessible
> > The function returning integer(c_int) rather than logical seems like
> > a screw up in the standard, but too late to fix that :(.
> 
> I think the idea is that it can directly call the C function without
> needing a wrapper. And as default-kind 'logical' != 'integer(c_int)' in
> general, it cannot return logical. (In case of GCC, just claiming that
> it is logical would work. But some Fortran compilers use -1 for .true.
> and only flip a single bit for .not. For those,
> "if(.not.omp_target_is_accessible(..)) will not work properly, if the C
> function returns 1.
> 
> But I concur that requiring "/= 0" is ugly!

Yeah, but for the APIs that don't have any iso_c_binding arguments
we just use wrappers rather than bind(c) and it allows for more Fortran-like
callers.  So, if omp_target_is_accessible had the *_ wrapper (or alias if
we determine logical ir the same as c_int in the ABI passing), people could
avoid the /= 0 stuff.
Anyway, that is just a thought for future APIs that if they return
false/true only bind(c) isn't always a good idea.

Jakub



Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async

2022-05-05 Thread Tobias Burnus

On 05.05.22 10:30, Jakub Jelinek via Fortran wrote:

+  memcpy_t *a = args;
+  int ret = omp_target_memcpy_copy (a->dst, a->src, a->length, a->dst_offset,
+a->src_offset, a->dst_devicep,
+a->src_devicep);
+  if (ret)
+gomp_fatal ("asynchronous memcpy failed");


I wonder whether that should be 'omp_target_memcpy_async failed' or
similar to make clear that it comes from a user's API call.

Or "asynchronous memcpy API routine failed" to avoid a bit the issue of
...memcpy_async vs. ..._memcpy_rect_aysnc?


I'm not really sure killing the whole program if the copying failed is the
best action.  Has it been discussed on omp-lang?  Perhaps the APIs should
have a way how to propagate the result to the caller when it completes
somehow?


I think it hasn't been discussed – but the question is how to handle it
best with the current API. Namely, should it simply continue at the
taskwait? Having some way to communicate back that it failed would be
useful – either by a by-reference argument or some other more indirect
means.

I think aborting it bad – but not aborting and silently continuing is
likely to break as well. IMO, we the fatal is fine for now, but we might
need to come up with something on the spec side.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955