Re: [PATCH] libgomp, nvptx, amdgcn: parallel reverse offload

2023-09-21 Thread Tobias Burnus
Hi Andrew, hi Thomas, hi all, @Thomas: I wouldn't mind if you could glance at the nvptx/CUDA bits. On 12.09.23 16:27, Andrew Stubbs wrote: This patch implements parallel execution of OpenMP reverse offload kernels. ... The device threads that sent requests are still blocked waiting for the comp

[PATCH] libgomp, nvptx, amdgcn: parallel reverse offload

2023-09-12 Thread Andrew Stubbs
Hi all, This patch implements parallel execution of OpenMP reverse offload kernels. The first problem was that GPU device kernels may request reverse offload (via the "ancestor" clause) once for each running offload thread -- of which there may be thousands -- and the existing implementation