================ @@ -5692,7 +5692,10 @@ CGCallee CodeGenFunction::EmitCallee(const Expr *E) { // Resolve direct calls. } else if (auto DRE = dyn_cast<DeclRefExpr>(E)) { if (auto FD = dyn_cast<FunctionDecl>(DRE->getDecl())) { - return EmitDirectCallee(*this, FD); + auto CalleeDecl = FD->hasAttr<OpenCLKernelAttr>() + ? GlobalDecl(FD, KernelReferenceKind::Stub) + : FD; + return EmitDirectCallee(*this, CalleeDecl); ---------------- rjmccall wrote:
Hmm. It looks like the CUDA folks had this same problem and came up with an awkward workaround for it in `EmitDirectCallee`. We should really just be requesting the right GD in the first place. Could you add a `getGlobalDeclForDirectCall` function that does the right thing for both modes? If it ends up causing complicated behavior/test changes in CUDA mode, you can feel free to exclude CUDA for now and just leave a comment saying that the workaround should be removed in favor of doing the right thing in that function. https://github.com/llvm/llvm-project/pull/115821 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits