Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async
On Mon, Feb 21, 2022 at 12:19:20PM +0100, Marcel Vollweiler wrote: > gcc/ChangeLog: > > * omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and > target_memcpy_rect_async to omp_runtime_apis array. > > libgomp/ChangeLog: > > * libgomp.map: Added omp_target_memcpy_async and > omp_target_memcpy_rect_async. > * libgomp.texi: Both functions are now supported. > * omp.h.in: Added omp_target_memcpy_async and > omp_target_memcpy_rect_async. > * omp_lib.f90.in: Added interfaces for both new functions. > * omp_lib.h.in: Likewise. > * target.c (omp_target_memcpy): Restructured into check and copy part. > (omp_target_memcpy_check): New helper function for omp_target_memcpy and > omp_target_memcpy_async that checks requirements. > (omp_target_memcpy_copy): New helper function for omp_target_memcpy and > omp_target_memcpy_async that performs the memcpy. > (omp_target_memcpy_async_helper): New helper function that is used in > omp_target_memcpy_async for the asynchronous task. > (omp_target_memcpy_async): Added. > (omp_target_memcpy_rect): Restructured into check and copy part. > (omp_target_memcpy_rect_check): New helper function for > omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks > requirements. > (omp_target_memcpy_rect_copy): New helper function for > omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs > the memcpy. > (omp_target_memcpy_rect_async_helper): New helper function that is used > in omp_target_memcpy_rect_async for the asynchronous task. > (omp_target_memcpy_rect_async): Added. > * testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test. > * testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test. > * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test. > * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test. > * testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test. > * testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test. > * testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test. > * testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test. > > --- a/libgomp/libgomp.map > +++ b/libgomp/libgomp.map > @@ -224,6 +224,8 @@ OMP_5.1 { > omp_set_teams_thread_limit_8_; > omp_get_teams_thread_limit; > omp_get_teams_thread_limit_; > + omp_target_memcpy_async; > + omp_target_memcpy_rect_async; > } OMP_5.0.2; These should be added to OMP_5.1.1, not here. > --- a/libgomp/omp.h.in > +++ b/libgomp/omp.h.in > @@ -272,6 +272,10 @@ extern int omp_target_is_present (const void *, int) > __GOMP_NOTHROW; > extern int omp_target_memcpy (void *, const void *, __SIZE_TYPE__, > __SIZE_TYPE__, __SIZE_TYPE__, int, int) >__GOMP_NOTHROW; > +extern int omp_target_memcpy_async (void *, const void *, __SIZE_TYPE__, > + __SIZE_TYPE__, __SIZE_TYPE__, int, int, > + int, omp_depend_t*) Formatting, space before *. > + __GOMP_NOTHROW; > extern int omp_target_memcpy_rect (void *, const void *, __SIZE_TYPE__, int, > const __SIZE_TYPE__ *, > const __SIZE_TYPE__ *, > @@ -279,6 +283,14 @@ extern int omp_target_memcpy_rect (void *, const void *, > __SIZE_TYPE__, int, > const __SIZE_TYPE__ *, > const __SIZE_TYPE__ *, int, int) >__GOMP_NOTHROW; > +extern int omp_target_memcpy_rect_async (void *, const void *, __SIZE_TYPE__, > + int, const __SIZE_TYPE__ *, > + const __SIZE_TYPE__ *, > + const __SIZE_TYPE__ *, > + const __SIZE_TYPE__ *, > + const __SIZE_TYPE__ *, int, int, int, > + omp_depend_t*) Likewise. > -int > -omp_target_memcpy (void *dst, const void *src, size_t length, > -size_t dst_offset, size_t src_offset, int dst_device_num, > -int src_device_num) > +static int > +omp_target_memcpy_check (void *dst, const void *src, int dst_device_num, > + int src_device_num, > + struct gomp_device_descr **dst_devicep, > + struct gomp_device_descr **src_devicep) > { Why does omp_target_memcpy_check need the dst and src arguments? From what I can see, they aren't used by it. > +typedef struct > +{ > + void *dst; > + const void *src; > + size_t length; > + size_t dst_offset; > + size_t src_offset; > + struct gomp_device_descr *dst_devicep; > + struct gomp_device_descr *src_devicep; > +} memcpy_t; Please come up with s
Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.
On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote: > --- a/libgomp/libgomp.map > +++ b/libgomp/libgomp.map > @@ -226,6 +226,11 @@ OMP_5.1 { > omp_get_teams_thread_limit_; > } OMP_5.0.2; > > +OMP_5.1.1 { > + global: > + omp_target_is_accessible; > +} OMP_5.1; > + You've already added another OMP_5.1.1 symbol, so this hunk will need to be adjusted. Keep the names in there alphabetically sorted. > --- a/libgomp/omp_lib.f90.in > +++ b/libgomp/omp_lib.f90.in > @@ -835,6 +835,16 @@ >end function omp_target_disassociate_ptr > end interface > > +interface > + function omp_target_is_accessible (ptr, size, device_num) bind(c) > +use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int > +integer(c_int) :: omp_target_is_accessible The function returning integer(c_int) rather than logical seems like a screw up in the standard, but too late to fix that :(. > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -3666,6 +3666,24 @@ omp_target_disassociate_ptr (const void *ptr, int > device_num) > } > > int > +omp_target_is_accessible (const void *ptr, size_t size, int device_num) > +{ > + if (device_num < 0 || device_num > gomp_get_num_devices ()) > +return false; > + > + if (device_num == gomp_get_num_devices ()) > +return true; > + > + struct gomp_device_descr *devicep = resolve_device (device_num); > + if (devicep == NULL) > +return false; > + > + /* TODO: Unified shared memory must be handled when available. */ > + > + return devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM; I guess for now it is reasonable, but I wonder if even without GOMP_OFFLOAD_CAP_SHARED_MEM one can't for CUDA or GCN allocate host memory (not all, but just some subset) that will be accessible on the device (I bet that means accessible through the same address on the host and device, aka partial shared mem). So, ok for trunk. OT, tried to look how libomptarget implements it and they don't at least on llvm-project trunk, but while looking at that, noticed that for omp_target_is_present they do return false from omp_target_is_present while we return true. It is unclear if NULL has corresponding storage on the device (NULL always corresponds to NULL on the device) or not. Jakub
Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.
Hi, On 05.05.22 11:33, Jakub Jelinek via Gcc-patches wrote: On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote: +interface + function omp_target_is_accessible (ptr, size, device_num) bind(c) +use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int +integer(c_int) :: omp_target_is_accessible The function returning integer(c_int) rather than logical seems like a screw up in the standard, but too late to fix that :(. I think the idea is that it can directly call the C function without needing a wrapper. And as default-kind 'logical' != 'integer(c_int)' in general, it cannot return logical. (In case of GCC, just claiming that it is logical would work. But some Fortran compilers use -1 for .true. and only flip a single bit for .not. For those, "if(.not.omp_target_is_accessible(..)) will not work properly, if the C function returns 1. But I concur that requiring "/= 0" is ugly! OT, tried to look how libomptarget implements it and they don't at least on llvm-project trunk, but while looking at that, noticed that for omp_target_is_present they do return false from omp_target_is_present while we return true. It is unclear if NULL has corresponding storage on the device (NULL always corresponds to NULL on the device) or not. Regarding NULL: no idea what's the best semantic – we could ask for clarification. Regarding target: I think "false" from on device makes more sense in general, especially if the device number points to a different device. It might work in some cases – but false simply plays save. Note that the spec states: "When called from within a target region the effect is unspecified." Thus, either behavior is fine. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.
On Thu, May 05, 2022 at 11:45:19AM +0200, Tobias Burnus wrote: > > On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote: > > > +interface > > > + function omp_target_is_accessible (ptr, size, device_num) > > > bind(c) > > > +use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, > > > c_int > > > +integer(c_int) :: omp_target_is_accessible > > The function returning integer(c_int) rather than logical seems like > > a screw up in the standard, but too late to fix that :(. > > I think the idea is that it can directly call the C function without > needing a wrapper. And as default-kind 'logical' != 'integer(c_int)' in > general, it cannot return logical. (In case of GCC, just claiming that > it is logical would work. But some Fortran compilers use -1 for .true. > and only flip a single bit for .not. For those, > "if(.not.omp_target_is_accessible(..)) will not work properly, if the C > function returns 1. > > But I concur that requiring "/= 0" is ugly! Yeah, but for the APIs that don't have any iso_c_binding arguments we just use wrappers rather than bind(c) and it allows for more Fortran-like callers. So, if omp_target_is_accessible had the *_ wrapper (or alias if we determine logical ir the same as c_int in the ABI passing), people could avoid the /= 0 stuff. Anyway, that is just a thought for future APIs that if they return false/true only bind(c) isn't always a good idea. Jakub
Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async
On 05.05.22 10:30, Jakub Jelinek via Fortran wrote: + memcpy_t *a = args; + int ret = omp_target_memcpy_copy (a->dst, a->src, a->length, a->dst_offset, +a->src_offset, a->dst_devicep, +a->src_devicep); + if (ret) +gomp_fatal ("asynchronous memcpy failed"); I wonder whether that should be 'omp_target_memcpy_async failed' or similar to make clear that it comes from a user's API call. Or "asynchronous memcpy API routine failed" to avoid a bit the issue of ...memcpy_async vs. ..._memcpy_rect_aysnc? I'm not really sure killing the whole program if the copying failed is the best action. Has it been discussed on omp-lang? Perhaps the APIs should have a way how to propagate the result to the caller when it completes somehow? I think it hasn't been discussed – but the question is how to handle it best with the current API. Namely, should it simply continue at the taskwait? Having some way to communicate back that it failed would be useful – either by a by-reference argument or some other more indirect means. I think aborting it bad – but not aborting and silently continuing is likely to break as well. IMO, we the fatal is fine for now, but we might need to come up with something on the spec side. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955