[PATCH] D120566: [OpenCL][AMDGPU]: Do not allow a call to kernel

John McCall via Phabricator via cfe-commits Fri, 25 Feb 2022 15:16:54 -0800

rjmccall added a comment.

In D120566#3346533 <https://reviews.llvm.org/D120566#3346533>, @arsenm wrote:

> In D120566#3346506 <https://reviews.llvm.org/D120566#3346506>, @rjmccall 
> wrote:
>
>> Is there something which stops you from taking the address of a kernel and 
>> then calling it?  If not, are there actually any uses of kernels in the 
>> module that shouldn't be rewritten as uses of the clone?
>
> The actual amdgpu_kernel is uncallable and has a totally different ABI, and 
> is invoked by external driver code. From the user's device code perspective, 
> only the callable function version is meaningful.

I think you're misunderstanding what I'm asking.  I believe that in OpenCL, you 
can do `&someKernelFunction` in source code and then call that.  The rewrite in 
this patch does not handle non-call uses of the kernel function and so will 
continue to miscompile them.

>> I feel like this would be a lot easier to just fix in your LLVM pass so that 
>> you rewrite any uses of a kernel to use a clone instead before you rewrite 
>> the kernel.
>
> Then we can't ban calls to kernels (and would be pushing some kind of symbol 
> naming conflict decision into the backend) and in principle would have to 
> handle this actual call.

Okay, this is not an accurate description of what you're trying to do, and this 
is important to be precise about.  You are not "banning calls to kernels", 
which would be a novel language restriction and make you non-conformant to 
OpenCL.  You still have a language requirement to allow code to directly use 
kernel functions.  That is why this patch is modifying IR generation instead of 
emitting new errors in Sema.

What's happening here is that your target (very reasonably) requires kernels to 
have a special kernel entrypoint in order to be called from outside.  That 
entrypoint uses a very different ABI from ordinary functions, one which 
simplifies being dynamically called by the runtime, and so it is important that 
ordinary uses of the function don't accidentally resolve against that special 
entrypoint.  You therefore need two different functions for the kernel, one to 
satisfy standard uses and one to act as the kernel entrypoint.

Your current architecture is to generate code normally, which will produce 
what's roughly the standard entrypoint, and then have a backend pass break that 
down and produce a kernel entrypoint.  I can understand why you find that 
frustratingly limited, and I agree that it doesn't seem to handle standard uses 
correctly.  Something needs to change here.

Now, Clang supports many different kernel languages, all of which face very 
similar language/implementation problems.  It is therefore always informative 
to go check to see how other language implementors have tried to solve the 
problem you're facing.  So if you go and look at how CUDA is implemented in 
Clang, you will see that they have introduced a "kernel reference kind" to 
`GlobalDecl`, which allows them to distinguish between the kernel entrypoint 
and the standard entrypoint in IRGen.  You could very easily build on this in 
your OpenCL implementation so that Clang emits the standard entrypoint and then 
either your pass or IRGen itself fills in the kernel entrypoint by marshaling 
arguments and then calling (presumably in a way that forces inlining) the 
standard entrypoint.  That would also give you total control of how arguments 
are marshaled in the kernel entrypoint.

Alternatively, I think cloning the standard entrypoint so that uses of it are 
rewritten to a clone is reasonable enough.  I don't really see why doing the 
cloning in IRGen is necessary when you already have a module pass that does 
similar kinds of rewriting.  Doing the clone in IRGen also does not seem to 
move you closer to your goal of actually marshaling arguments differently.  
Most importantly, though, I believe you do need to rewrite all the uses and not 
just the direct calls.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120566/new/

https://reviews.llvm.org/D120566

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120566: [OpenCL][AMDGPU]: Do not allow a call to kernel

Reply via email to