[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

Artem Belevich via Phabricator via cfe-commits Tue, 17 Nov 2020 10:27:46 -0800

tra added a reviewer: jlebar.
tra added a comment.

In D91590#2398842 <https://reviews.llvm.org/D91590#2398842>, @hliao wrote:


> This's an experimental or demo-only patch in my spare time on eliminating 
> private memory usage in https://godbolt.org/z/EPPn6h. The attachment 
> F14026286: sample.tar.xz <https://reviews.llvm.org/F14026286> includes both 
> the reference and new IR, PTX, and SASS (sm_60) output. For the new code, 
> that aggregate argument is loaded through `LDC` instruction in SASS instead 
> of `MOV` due to the non-static address. I don't have sm_60 to verify that. 
> Could you try that on the real hardware?

I'll give it a try.

> BTW, from PTX ISA document, parameter space is read-only for input parameters 
> and write-only for output parameters. If that's right, even non-kernel 
> function may also require a similar change as the semantic is different from 
> the language model, where the argument variable could be modified in the 
> function body.

Regular functions currently handle parameters exactly the same way as kernels - 
via a copy to a local buffer, which can then be modified.  
https://godbolt.org/z/W9PY17
So, if we need to change a parameter, it would have to be done on a local copy.



================
Comment at: clang/test/CodeGenCUDA/kernel-args.cu:13-14
 // AMDGCN: define amdgpu_kernel void @_Z6kernel1A(%struct.A addrspace(4)* 
byref(%struct.A) align 8 %{{.+}})
-// NVPTX: define void @_Z6kernel1A(%struct.A* byval(%struct.A) align 8 %x)
+// NVPTX: define void @_Z6kernel1A(%struct.A addrspace(101)* byref(%struct.A) 
align 8 %0)
 __global__ void kernel(A x) {
 }
----------------
Is the idea here to rely on PTX to store the value in param space (so we do 
actually pass the parameter by value)  and represent it on IR level as a 
reference to an an externally-provided storage with the value.
So:
- C++ passes argument by value
- IR knows that PTX will store it somewhere in param space and uses `byref`
- we still generate PTX which has parameter passed by value, but now we can 
access it directly via a reference to param-space value.

Presumably for parameters we do want to modify, we'll need to fall back to 
having a local copy.

So far so good. However, now we may have a problem distinguishing between 
C++-level arguments passed by value vs by reference -- they all will look like 
`byref` on IR level. That is, unless you rely on `addrspace(101)` to indicate 
that it's actually a `byval` in disguise. 

It looks plausible as long as we can guarantee that we never modify it. Neither 
in the current function nor in any of the callees, if we pass it by reference. 

I'm not particularly familiar with AA machinery. I'd appreciate if you could 
elaborate on how you see it all work end-to-end.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91590/new/

https://reviews.llvm.org/D91590

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

Reply via email to