[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Saiyedul Islam via Phabricator via cfe-commits Thu, 24 Aug 2023 10:06:49 -0700

saiislam added inline comments.


================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8649-8650
 
+  // code-object-version=X needs to be passed to clang-linker-wrapper to ensure
+  // that it is used by lld.
+  if (const Arg *A = Args.getLastArg(options::OPT_mcode_object_version_EQ)) {
----------------
arsenm wrote:
> so device rtl is linked once as a normal library?
No, this is command generation for clang-linker-wrapper. Since, devicertl is 
compiled only to get bitcode file (-c), it is never called.


================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8653-8654
+    CmdArgs.push_back(Args.MakeArgString("-mllvm"));
+    CmdArgs.push_back(Args.MakeArgString(
+        Twine("--amdhsa-code-object-version=") + A->getValue()));
+  }
----------------
arsenm wrote:
> Why do you need this? The code object version is supposed to come from a 
> module flag. We should be getting rid of the command line argument for it
During command generation for clang-linker-wrapper, it is required to check 
user's provided `mcode-object-version=X` so that `amdhsa-code-object-version=X` 
can be passed to the clang/lto backend.

`getAmdhsaCodeObjectVersion()` and `getHsaAbiVersion()` both still use the 
above command line argument to override user's choice of COV, instead of the 
module flag.


================
Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12
+// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck 
-check-prefix=LINKED5 %s
+
+#include "Inputs/cuda.h"
----------------
yaxunl wrote:
> need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link the 
> device lib and verify the load of llvm.amdgcn.abi.version being eliminated 
> after optimization.
> 
> I think currently it cannot do that since llvm.amdgcn.abi.version is not 
> internalized by the internalization pass. This can cause some significant 
> perf drops since loading is expensive. Need to tweak the function controlling 
> what variables can be internalized for amdgpu so that this variable gets 
> internalized, or having a generic way to tell that function which variables 
> should be internalized, e.g. by adding a metadata amdgcn.internalize
load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and 
mlink-builtin-bitcode of device lib.


================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:406-410
+  // pass on -mllvm options to the clang
+  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm)) {
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back(Arg->getValue());
+  }
----------------
arsenm wrote:
> Shouldn't need this?
It is required so that when clang pass (not the lto backend) is called from 
clang-linker-wrapper due to `-save-temps`, user provided COV is correctly 
propagated.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

Reply via email to