Re: [patch] libgomp: cuda.h and omp_target_memcpy_rect cleanup (was: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect)

2023-08-09 Thread Thomas Schwinge
Hi Tobias! On 2023-07-28T13:51:41+0200, Tobias Burnus wrote: > On 27.07.23 23:00, Thomas Schwinge wrote: >>> + else if (src_devicep != NULL >>> +&& (dst_devicep == NULL >>> +|| (dst_devicep->capabilities >>> +& GOMP_OFFLOAD_CAP_SHARED_MEM))) >>

Re: [patch] libgomp: cuda.h and omp_target_memcpy_rect cleanup (was: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect)

2023-07-29 Thread Tobias Burnus
Now committed as r14-2865-g8b9e559fe7ca5715c74115322af99dbf9137a399 Tobias On 28.07.23 13:51, Tobias Burnus wrote: thanks for proof reading and the suggestions! – Do have comments to the attached patch? * * * Crossref: For further optimizations, see also https://gcc.gnu.org/PR101581 — [OpenM

[patch] libgomp: cuda.h and omp_target_memcpy_rect cleanup (was: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect)

2023-07-28 Thread Tobias Burnus
Hi Thomas, thanks for proof reading and the suggestions! – Do have comments to the attached patch? * * * Crossref: For further optimizations, see also https://gcc.gnu.org/PR101581 — [OpenMP] omp_target_memcpy – support inter-device memcpy https://gcc.gnu.org/PR110813 — [OpenMP] omp_target_memc

Re: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect

2023-07-27 Thread Thomas Schwinge
Hi Tobias! On 2023-07-25T23:45:54+0200, Tobias Burnus wrote: > The attached patch calls CUDA's cuMemcopy2D and cuMemcpy3D > for omp_target_memcpy_rect[,_async} for dim=2/dim=3. This should > speed up the data transfer for noncontiguous data. ACK, thanks. > While being there, I ended up adding s

[patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect

2023-07-25 Thread Tobias Burnus
The attached patch calls CUDA's cuMemcopy2D and cuMemcpy3D for omp_target_memcpy_rect[,_async} for dim=2/dim=3. This should speed up the data transfer for noncontiguous data. While being there, I ended up adding support for device to other device copying; while potentially slow, it is still bette