[clang] [Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel (PR #115821)

John McCall via cfe-commits Tue, 03 Dec 2024 01:10:32 -0800

================
@@ -1085,8 +1085,10 @@ llvm::Value *CodeGenFunction::EmitBlockLiteral(const 
CGBlockInfo &blockInfo) {
       blockAddr.getPointer(), 
ConvertType(blockInfo.getBlockExpr()->getType()));
 
   if (IsOpenCL) {
-    CGM.getOpenCLRuntime().recordBlockInfo(blockInfo.BlockExpression, InvokeFn,
-                                           result, blockInfo.StructureType);
+    CGM.getOpenCLRuntime().recordBlockInfo(
+        blockInfo.BlockExpression, InvokeFn, result, blockInfo.StructureType,
+        CurGD && CurGD.isDeclOpenCLKernel() &&
+            (CurGD.getKernelReferenceKind() == KernelReferenceKind::Kernel));
----------------
rjmccall wrote:


That's one of the options, yeah. You'd always emit stubs as directly calling 
their kernels (so `_clang_ocl_kern_imp_caller_kern` would actually just call 
`caller_kern`, at least coming out of IRGen), and then you'd let LLVM decide 
when and whether to inline. The main disadvantage is that there might not 
already be code to emit calls to kernels; I imagine that's normally only done 
by the OpenCL runtime, and the ABI is probably pretty different from the normal 
call ABI.  But maybe that's not a big deal.

The second option is that you could emit stubs by making sure the kernel is 
emitted first and then directly cloning the kernel body into the stub.  Whether 
this is materially different from just inlining, I don't know, but it's an 
option. We do have code for this sort of thing already — we need it for 
emitting virtual varargs thunks in C++.

The third option is that you could emit kernels as directly calling their 
stubs. The advantage here is that all the calls are just normal calls that you 
definitely already have code to handle. The disadvantage is that you'd always 
be forcing the stub to be emitted, even in the 99% case that there are no other 
uses of it. It'll probably get reliably inlined away, but still, it's burning 
some extra compile time.

The last option is that you could emit kernels by emitting the stub and then 
directly cloning the function body into the kernel.

Up to you; they all have trade-offs. But I do think you need to not double-emit 
the kernel body.

https://github.com/llvm/llvm-project/pull/115821
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel (PR #115821)

Reply via email to