On 10/21/2015 11:07 AM, Alexander Monakov wrote:
In PTX, stack storage is in .local address space -- and that memory is
thread-private. A thread can make a pointer to its own stack memory and
successfully dereference it, but dereferencing that pointer from other threads
does not work (I observed it returning garbage values).
The reason for .local addresses being private like that, I think, is that
references to .local memory undergo address translation to make simultaneous
accesses to stack slots from threads in a warp form a coalesced memory
transaction. So .local memory looking consecutive from an individual thread's
point of view are actually strided in physical memory.
This sounds a little odd. You can convert a .local pointer to a generic
one and dereference the latter. Do you think there is such
behind-the-scenes magic going on for accesses through generic pointers?
Bernd