saiislam added inline comments.
================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8649-8650 + // code-object-version=X needs to be passed to clang-linker-wrapper to ensure + // that it is used by lld. + if (const Arg *A = Args.getLastArg(options::OPT_mcode_object_version_EQ)) { ---------------- arsenm wrote: > so device rtl is linked once as a normal library? No, this is command generation for clang-linker-wrapper. Since, devicertl is compiled only to get bitcode file (-c), it is never called. ================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:8653-8654 + CmdArgs.push_back(Args.MakeArgString("-mllvm")); + CmdArgs.push_back(Args.MakeArgString( + Twine("--amdhsa-code-object-version=") + A->getValue())); + } ---------------- arsenm wrote: > Why do you need this? The code object version is supposed to come from a > module flag. We should be getting rid of the command line argument for it During command generation for clang-linker-wrapper, it is required to check user's provided `mcode-object-version=X` so that `amdhsa-code-object-version=X` can be passed to the clang/lto backend. `getAmdhsaCodeObjectVersion()` and `getHsaAbiVersion()` both still use the above command line argument to override user's choice of COV, instead of the module flag. ================ Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12 +// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck -check-prefix=LINKED5 %s + +#include "Inputs/cuda.h" ---------------- yaxunl wrote: > need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link the > device lib and verify the load of llvm.amdgcn.abi.version being eliminated > after optimization. > > I think currently it cannot do that since llvm.amdgcn.abi.version is not > internalized by the internalization pass. This can cause some significant > perf drops since loading is expensive. Need to tweak the function controlling > what variables can be internalized for amdgpu so that this variable gets > internalized, or having a generic way to tell that function which variables > should be internalized, e.g. by adding a metadata amdgcn.internalize load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and mlink-builtin-bitcode of device lib. ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:406-410 + // pass on -mllvm options to the clang + for (const opt::Arg *Arg : Args.filtered(OPT_mllvm)) { + CmdArgs.push_back("-mllvm"); + CmdArgs.push_back(Arg->getValue()); + } ---------------- arsenm wrote: > Shouldn't need this? It is required so that when clang pass (not the lto backend) is called from clang-linker-wrapper due to `-save-temps`, user provided COV is correctly propagated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D139730/new/ https://reviews.llvm.org/D139730 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits