Re: libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling)

2023-04-28 Thread Tobias Burnus
On 28.04.23 11:31, Thomas Schwinge wrote: On 2023-04-28T10:48:31+0200, Tobias Burnus wrote: I don't think that just calling "exit (EXIT_FAILURE);" is the the proper way The point is, when we run into such an 'exit', we've already issued an error (in the plugin, via 'GOMP_PLUGIN_fatal'), you m

Re: libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling)

2023-04-28 Thread Thomas Schwinge
Hi Tobias! On 2023-04-28T10:48:31+0200, Tobias Burnus wrote: > On 21.03.23 16:53, Thomas Schwinge wrote: >> On 2022-08-26T11:07:28+0200, Tobias Burnus >> wrote: >>> This patch adds initial [OpenMP reverse offload] support for nvptx. >>> CUDA does lockup when trying to copy data from the currentl

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2023-04-28 Thread Thomas Schwinge
Hi Tobias! On 2023-04-28T10:28:22+0200, Tobias Burnus wrote: > maybe I misunderstood your suggestion, but Forst, note that those CUDA "Stream Memory Operations" are something that I found by chance, and don't have any actual experience with. I can't seem to find a lot of documentation/usage of

Re: libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling)

2023-04-28 Thread Tobias Burnus
Hi Thomas, On 21.03.23 16:53, Thomas Schwinge wrote: On 2022-08-26T11:07:28+0200, Tobias Burnus wrote: This patch adds initial [OpenMP reverse offload] support for nvptx. CUDA does lockup when trying to copy data from the currently running stream; hence, a new stream is generated to do the mem

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2023-04-28 Thread Tobias Burnus
Hi Thomas, maybe I misunderstood your suggestion, but "Wait on a memory location" assumes that there will be a change – but if a target region happens to have no reverse offload, the memory location will never change, but still the target region should return to the host. What we would need: Wai

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2023-04-04 Thread Thomas Schwinge
Hi! During GCC/OpenMP/nvptx reverse offload investigations, about how to replace the problematic global 'GOMP_REV_OFFLOAD_VAR', I may have found something re: On 2022-08-26T11:07:28+0200, Tobias Burnus wrote: > Better suggestions are welcome for the busy loop in > libgomp/plugin/plugin-nvptx.c r

[og12] libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling)

2023-03-24 Thread Thomas Schwinge
Hi! On 2023-03-21T16:53:31+0100, I wrote: > On 2022-08-26T11:07:28+0200, Tobias Burnus wrote: >> This patch adds initial [OpenMP reverse offload] support for nvptx. > >> CUDA does lockup when trying to copy data from the currently running >> stream; hence, a new stream is generated to do the memo

libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling)

2023-03-21 Thread Thomas Schwinge
Hi! On 2022-08-26T11:07:28+0200, Tobias Burnus wrote: > This patch adds initial [OpenMP reverse offload] support for nvptx. > CUDA does lockup when trying to copy data from the currently running > stream; hence, a new stream is generated to do the memory copying. As part of other work, where I

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-10-02 Thread Tobias Burnus
On 27.09.22 11:23, Tobias Burnus wrote: We do support #if __PTX_SM__ >= 600 (CUDA >= 8.0, ptx isa >= 5.0) and we also can configure GCC with --with-arch=sm_70 (or sm_80 or ...) Thus, adding atomics with .sys scope is possible. See attached patch. This seems to work fine and I hope I got the a

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-28 Thread Alexander Monakov via Gcc-patches
On Tue, 27 Sep 2022, Tobias Burnus wrote: > Ignoring (1), does the overall patch and this part otherwise look okay(ish)? > > > Caveat: The .sys scope works well with >= sm_60 but not does not handle > older versions. For those, the __atomic_{load/store}_n are used. I do not > see a good solut

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-27 Thread Tobias Burnus
Hi, On 26.09.22 19:45, Alexander Monakov wrote: My main concerns remain not addressed: 1) what I said in the opening paragraphs of my previous email; (i.e. the general disagreement whether the feature itself should be implemented for nvptx or not.) 2) device-issued atomics are not guaranteed

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-26 Thread Alexander Monakov via Gcc-patches
Hi. My main concerns remain not addressed: 1) what I said in the opening paragraphs of my previous email; 2) device-issued atomics are not guaranteed to appear atomic to the host unless using atom.sys and translating for CUDA compute capability 6.0+. Item 2 is a correctness issue. Item 1 I th

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-26 Thread Tobias Burnus
Hi Alexander, On 21.09.22 22:06, Alexander Monakov wrote: It also goes against the good practice of accelerator programming, which requires queueing work on the accelerator and letting it run asynchronously with the CPU with high occupancy. (I know libgomp still waits for the GPU to finish in ea

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-21 Thread Alexander Monakov via Gcc-patches
Hi. On the high level, I'd be highly uncomfortable with this. I guess we are in vague agreement that it cannot be efficiently implemented. It also goes against the good practice of accelerator programming, which requires queueing work on the accelerator and letting it run asynchronously with the

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-13 Thread Tobias Burnus
@Alexander/@Tom – Can you comment on both libgomp/config/nvptx + libgomp/plugin/plugin-nvptx.c ? (Comments on the rest are welcome, too) (Updated patch enclosed) Because Jakub asked: I'm afraid you need Alexander or Tom here, I don't feel I can review it; I could rubber stamp it if they are ok

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-09 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 26, 2022 at 11:07:28AM +0200, Tobias Burnus wrote: > @Tom and Alexander: Better suggestions are welcome for the busy loop in > libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking > its value. I'm afraid you need Alexander or Tom here, I don't feel I can review i

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-09 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 26, 2022 at 05:56:09PM +0300, Alexander Monakov via Gcc-patches wrote: > > On Fri, 26 Aug 2022, Tobias Burnus wrote: > > > @Tom and Alexander: Better suggestions are welcome for the busy loop in > > libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking > > its v

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-08-26 Thread Alexander Monakov via Gcc-patches
On Fri, 26 Aug 2022, Tobias Burnus wrote: > @Tom and Alexander: Better suggestions are welcome for the busy loop in > libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking > its value. I think to do that without polling you can use PTX 'brkpt' instruction on the device and

[Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-08-26 Thread Tobias Burnus
@Tom and Alexander: Better suggestions are welcome for the busy loop in libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking its value. PRE-REMARK As nvptx (and all other plugins) returns <= 0 for GOMP_OFFLOAD_get_num_devices if GOMP_REQUIRES_REVERSE_OFFLOAD is set. This