Hi Andrew, hi Thomas, hi all,
@Thomas: I wouldn't mind if you could glance at the nvptx/CUDA bits.
On 12.09.23 16:27, Andrew Stubbs wrote:
This patch implements parallel execution of OpenMP reverse offload
kernels.
...
The device threads that sent requests are still blocked waiting for
the comp
Hi all,
This patch implements parallel execution of OpenMP reverse offload kernels.
The first problem was that GPU device kernels may request reverse
offload (via the "ancestor" clause) once for each running offload thread
-- of which there may be thousands -- and the existing implementation