saiislam marked 4 inline comments as done. saiislam added inline comments.
================ Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:383 + CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32, + llvm::GlobalValue::WeakODRLinkage); +} ---------------- yaxunl wrote: > > I am not sure weak_odr linkage will work when code object version is none. > This will cause conflict when a module emitted with cov none is linked with a > module emitted with cov4 or cov5. Also, when all modules are emitted with cov > none, we end up with a linked module with cov none and the work group size > code will not work. > > Probably we need to emit llvm.amdgcn.abi.version with external linkage for > cov none. > > Another issue is that llvm.amdgcn.abi.version is not internalized. It is > always loaded from memory even though it is in constant address space. This > will cause bad performance. Considering device libs may use clang builtin for > workgroup size. The performance impact may be significant. To avoid > performance degradation, we need to internalize it as early as possible in > the optimization pipeline. I tried external linkage but it didn't work. Only weak_odr is working fine. ================ Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:369-386 + if (CGM.getModule().getNamedGlobal(Name)) + return; + + auto *Type = + llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size); + auto *GV = new llvm::GlobalVariable( + CGM.getModule(), Type, true, Linkage, ---------------- arsenm wrote: > You moved GetOrCreateLLVMGlobal but don't use it? > > The lamdba is unnecessary for a single local use I am using GetOrCreateLLVMGlobal in CGBuiltin.cpp while emitting code for amdgpu_worgroup_size. ================ Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:369-386 + if (CGM.getModule().getNamedGlobal(Name)) + return; + + auto *Type = + llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size); + auto *GV = new llvm::GlobalVariable( + CGM.getModule(), Type, true, Linkage, ---------------- saiislam wrote: > arsenm wrote: > > You moved GetOrCreateLLVMGlobal but don't use it? > > > > The lamdba is unnecessary for a single local use > I am using GetOrCreateLLVMGlobal in CGBuiltin.cpp while emitting code for > amdgpu_worgroup_size. I was hoping that this patch will pave way for D130096, so that it can generate rest of the control constants using the same lambda. I can remove this and simplify the code if you want. ================ Comment at: clang/lib/Driver/ToolChain.cpp:1372 + if (SameTripleAsHost || + A->getOption().matches(options::OPT_mcode_object_version_EQ)) DAL->append(A); ---------------- arsenm wrote: > Don't understand why this is necessary This function creates a derived argument list for OpenMP target specific flags. `mcode-object-version` remains unset for device compilation step if we don't pass it here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D139730/new/ https://reviews.llvm.org/D139730 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits