[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Saiyedul Islam via Phabricator via cfe-commits Mon, 21 Aug 2023 11:26:07 -0700

saiislam marked 4 inline comments as done.
saiislam added inline comments.


================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:383
+            CGM.getTarget().getTargetOpts().CodeObjectVersion, /*Size=*/32,
+            llvm::GlobalValue::WeakODRLinkage);
+}
----------------
yaxunl wrote:
> 
> I am not sure weak_odr linkage will work when code object version is none. 
> This will cause conflict when a module emitted with cov none is linked with a 
> module emitted with cov4 or cov5. Also, when all modules are emitted with cov 
> none, we end up with a linked module with cov none and the work group size 
> code will not work.
> 
> Probably we need to emit llvm.amdgcn.abi.version with external linkage for 
> cov none.
> 
> Another issue is that llvm.amdgcn.abi.version is not internalized. It is 
> always loaded from memory even though it is in constant address space. This 
> will cause bad performance. Considering device libs may use clang builtin for 
> workgroup size. The performance impact may be significant. To avoid 
> performance degradation, we need to internalize it as early as possible in 
> the optimization pipeline.
I tried external linkage but it didn't work. Only weak_odr is working fine.


================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:369-386
+    if (CGM.getModule().getNamedGlobal(Name))
+      return;
+
+    auto *Type =
+        llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size);
+    auto *GV = new llvm::GlobalVariable(
+        CGM.getModule(), Type, true, Linkage,
----------------
arsenm wrote:
> You moved GetOrCreateLLVMGlobal but don't use it? 
> 
> The lamdba is unnecessary for a single local use
I am using GetOrCreateLLVMGlobal in CGBuiltin.cpp while emitting code for 
amdgpu_worgroup_size.


================
Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:369-386
+    if (CGM.getModule().getNamedGlobal(Name))
+      return;
+
+    auto *Type =
+        llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size);
+    auto *GV = new llvm::GlobalVariable(
+        CGM.getModule(), Type, true, Linkage,
----------------
saiislam wrote:
> arsenm wrote:
> > You moved GetOrCreateLLVMGlobal but don't use it? 
> > 
> > The lamdba is unnecessary for a single local use
> I am using GetOrCreateLLVMGlobal in CGBuiltin.cpp while emitting code for 
> amdgpu_worgroup_size.
I was hoping that this patch will pave way for D130096, so that it can generate 
rest of the control constants using the same lambda.
I can remove this and simplify the code if you want.


================
Comment at: clang/lib/Driver/ToolChain.cpp:1372
+      if (SameTripleAsHost ||
+          A->getOption().matches(options::OPT_mcode_object_version_EQ))
         DAL->append(A);
----------------
arsenm wrote:
> Don't understand why this is necessary
This function creates a derived argument list for OpenMP target specific flags.
`mcode-object-version` remains unset for device compilation step if we don't 
pass it here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Reply via email to