On 28.04.23 11:31, Thomas Schwinge wrote:
On 2023-04-28T10:48:31+0200, Tobias Burnus wrote:
I don't think that just calling "exit (EXIT_FAILURE);" is the the proper
way
The point is, when we run into such an 'exit', we've already issued an
error (in the plugin, via 'GOMP_PLUGIN_fatal'),
you m
Hi Tobias!
On 2023-04-28T10:48:31+0200, Tobias Burnus wrote:
> On 21.03.23 16:53, Thomas Schwinge wrote:
>> On 2022-08-26T11:07:28+0200, Tobias Burnus
>> wrote:
>>> This patch adds initial [OpenMP reverse offload] support for nvptx.
>>> CUDA does lockup when trying to copy data from the currentl
Hi Tobias!
On 2023-04-28T10:28:22+0200, Tobias Burnus wrote:
> maybe I misunderstood your suggestion, but
Forst, note that those CUDA "Stream Memory Operations" are something that
I found by chance, and don't have any actual experience with. I can't
seem to find a lot of documentation/usage of
Hi Thomas,
On 21.03.23 16:53, Thomas Schwinge wrote:
On 2022-08-26T11:07:28+0200, Tobias Burnus
wrote:
This patch adds initial [OpenMP reverse offload] support for nvptx.
CUDA does lockup when trying to copy data from the currently running
stream; hence, a new stream is generated to do the mem
Hi Thomas,
maybe I misunderstood your suggestion, but "Wait on a memory location"
assumes that there will be a change – but if a target region happens to
have no reverse offload, the memory location will never change, but
still the target region should return to the host.
What we would need: Wai
Hi!
During GCC/OpenMP/nvptx reverse offload investigations, about how to
replace the problematic global 'GOMP_REV_OFFLOAD_VAR', I may have found
something re:
On 2022-08-26T11:07:28+0200, Tobias Burnus wrote:
> Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c r
Hi!
On 2023-03-21T16:53:31+0100, I wrote:
> On 2022-08-26T11:07:28+0200, Tobias Burnus wrote:
>> This patch adds initial [OpenMP reverse offload] support for nvptx.
>
>> CUDA does lockup when trying to copy data from the currently running
>> stream; hence, a new stream is generated to do the memo
Hi!
On 2022-08-26T11:07:28+0200, Tobias Burnus wrote:
> This patch adds initial [OpenMP reverse offload] support for nvptx.
> CUDA does lockup when trying to copy data from the currently running
> stream; hence, a new stream is generated to do the memory copying.
As part of other work, where I
On 27.09.22 11:23, Tobias Burnus wrote:
We do support
#if __PTX_SM__ >= 600 (CUDA >= 8.0, ptx isa >= 5.0)
and we also can configure GCC with
--with-arch=sm_70 (or sm_80 or ...)
Thus, adding atomics with .sys scope is possible.
See attached patch. This seems to work fine and I hope I got the
a
On Tue, 27 Sep 2022, Tobias Burnus wrote:
> Ignoring (1), does the overall patch and this part otherwise look okay(ish)?
>
>
> Caveat: The .sys scope works well with >= sm_60 but not does not handle
> older versions. For those, the __atomic_{load/store}_n are used. I do not
> see a good solut
Hi,
On 26.09.22 19:45, Alexander Monakov wrote:
My main concerns remain not addressed:
1) what I said in the opening paragraphs of my previous email;
(i.e. the general disagreement whether the feature itself should be implemented
for nvptx or not.)
2) device-issued atomics are not guaranteed
Hi.
My main concerns remain not addressed:
1) what I said in the opening paragraphs of my previous email;
2) device-issued atomics are not guaranteed to appear atomic to the host
unless using atom.sys and translating for CUDA compute capability 6.0+.
Item 2 is a correctness issue. Item 1 I th
Hi Alexander,
On 21.09.22 22:06, Alexander Monakov wrote:
It also goes
against the good practice of accelerator programming, which requires queueing
work on the accelerator and letting it run asynchronously with the CPU with high
occupancy.
(I know libgomp still waits for the GPU to finish in ea
Hi.
On the high level, I'd be highly uncomfortable with this. I guess we are in
vague agreement that it cannot be efficiently implemented. It also goes
against the good practice of accelerator programming, which requires queueing
work on the accelerator and letting it run asynchronously with the
@Alexander/@Tom – Can you comment on both libgomp/config/nvptx +
libgomp/plugin/plugin-nvptx.c ? (Comments on the rest are welcome, too)
(Updated patch enclosed)
Because Jakub asked:
I'm afraid you need Alexander or Tom here, I don't feel I can review it;
I could rubber stamp it if they are ok
On Fri, Aug 26, 2022 at 11:07:28AM +0200, Tobias Burnus wrote:
> @Tom and Alexander: Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> its value.
I'm afraid you need Alexander or Tom here, I don't feel I can review i
On Fri, Aug 26, 2022 at 05:56:09PM +0300, Alexander Monakov via Gcc-patches
wrote:
>
> On Fri, 26 Aug 2022, Tobias Burnus wrote:
>
> > @Tom and Alexander: Better suggestions are welcome for the busy loop in
> > libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> > its v
On Fri, 26 Aug 2022, Tobias Burnus wrote:
> @Tom and Alexander: Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> its value.
I think to do that without polling you can use PTX 'brkpt' instruction on the
device and
@Tom and Alexander: Better suggestions are welcome for the busy loop in
libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
its value.
PRE-REMARK
As nvptx (and all other plugins) returns <= 0 for
GOMP_OFFLOAD_get_num_devices if GOMP_REQUIRES_REVERSE_OFFLOAD is
set. This
19 matches
Mail list logo