Hi Tobias!
On 2023-04-28T10:28:22+0200, Tobias Burnus wrote:
> maybe I misunderstood your suggestion, but
Forst, note that those CUDA "Stream Memory Operations" are something that
I found by chance, and don't have any actual experience with. I can't
seem to find a lot of documentation/usage of
Hi Thomas,
maybe I misunderstood your suggestion, but "Wait on a memory location"
assumes that there will be a change – but if a target region happens to
have no reverse offload, the memory location will never change, but
still the target region should return to the host.
What we would need: Wai
Hi!
During GCC/OpenMP/nvptx reverse offload investigations, about how to
replace the problematic global 'GOMP_REV_OFFLOAD_VAR', I may have found
something re:
On 2022-08-26T11:07:28+0200, Tobias Burnus wrote:
> Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c r
On 27.09.22 11:23, Tobias Burnus wrote:
We do support
#if __PTX_SM__ >= 600 (CUDA >= 8.0, ptx isa >= 5.0)
and we also can configure GCC with
--with-arch=sm_70 (or sm_80 or ...)
Thus, adding atomics with .sys scope is possible.
See attached patch. This seems to work fine and I hope I got the
a
On Tue, 27 Sep 2022, Tobias Burnus wrote:
> Ignoring (1), does the overall patch and this part otherwise look okay(ish)?
>
>
> Caveat: The .sys scope works well with >= sm_60 but not does not handle
> older versions. For those, the __atomic_{load/store}_n are used. I do not
> see a good solut
Hi,
On 26.09.22 19:45, Alexander Monakov wrote:
My main concerns remain not addressed:
1) what I said in the opening paragraphs of my previous email;
(i.e. the general disagreement whether the feature itself should be implemented
for nvptx or not.)
2) device-issued atomics are not guaranteed
Hi.
My main concerns remain not addressed:
1) what I said in the opening paragraphs of my previous email;
2) device-issued atomics are not guaranteed to appear atomic to the host
unless using atom.sys and translating for CUDA compute capability 6.0+.
Item 2 is a correctness issue. Item 1 I th
Hi Alexander,
On 21.09.22 22:06, Alexander Monakov wrote:
It also goes
against the good practice of accelerator programming, which requires queueing
work on the accelerator and letting it run asynchronously with the CPU with high
occupancy.
(I know libgomp still waits for the GPU to finish in ea
Hi.
On the high level, I'd be highly uncomfortable with this. I guess we are in
vague agreement that it cannot be efficiently implemented. It also goes
against the good practice of accelerator programming, which requires queueing
work on the accelerator and letting it run asynchronously with the
@Alexander/@Tom – Can you comment on both libgomp/config/nvptx +
libgomp/plugin/plugin-nvptx.c ? (Comments on the rest are welcome, too)
(Updated patch enclosed)
Because Jakub asked:
I'm afraid you need Alexander or Tom here, I don't feel I can review it;
I could rubber stamp it if they are ok
On Fri, Aug 26, 2022 at 11:07:28AM +0200, Tobias Burnus wrote:
> @Tom and Alexander: Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> its value.
I'm afraid you need Alexander or Tom here, I don't feel I can review i
On Fri, Aug 26, 2022 at 05:56:09PM +0300, Alexander Monakov via Gcc-patches
wrote:
>
> On Fri, 26 Aug 2022, Tobias Burnus wrote:
>
> > @Tom and Alexander: Better suggestions are welcome for the busy loop in
> > libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> > its v
On Fri, 26 Aug 2022, Tobias Burnus wrote:
> @Tom and Alexander: Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> its value.
I think to do that without polling you can use PTX 'brkpt' instruction on the
device and
13 matches
Mail list logo