https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893
Tom de Vries <vries at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |WORKSFORME
Status|UNCONFIRMED |RESOLVED
--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #1)
> (In reply to Tom de Vries from comment #0)
> > The per-thread call stack is handled for .local memory by the CUDA driver.
> >
> > For the 'soft stack' that's not the case.
>
> Hmm, actually there's .local memory used, just not "directly". Possibly the
> documentation needs updating to point that out.
>
> So, there doesn't seem to be an issue related to overlapping storage.
>
> So I wonder, is the stack pointer also per thread then? Or still per-warp?
OK, here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203#c6 ) we read:
...
The pointer is switched between per-warp global memory and per-lane local
memory.
...
So, I think this should be fine then.
Marking this resolved-worksforme until we run into an actual failing test-case.