yaxunl accepted this revision. yaxunl added a comment. LGTM. Thanks
================ Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12 +// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck -check-prefix=LINKED5 %s + +#include "Inputs/cuda.h" ---------------- saiislam wrote: > yaxunl wrote: > > need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link > > the device lib and verify the load of llvm.amdgcn.abi.version being > > eliminated after optimization. > > > > I think currently it cannot do that since llvm.amdgcn.abi.version is not > > internalized by the internalization pass. This can cause some significant > > perf drops since loading is expensive. Need to tweak the function > > controlling what variables can be internalized for amdgpu so that this > > variable gets internalized, or having a generic way to tell that function > > which variables should be internalized, e.g. by adding a metadata > > amdgcn.internalize > load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and > mlink-builtin-bitcode of device lib. It seems being eliminated by IPSCCP. It makes sense since it is constant weak_odr without externally_initialized. Either changing it to weak or adding externally_initialized will keep the load. Normal `__constant__` var in device code may be changed by host code, therefore they are emitted with externally_initialized and do not have the load eliminated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D139730/new/ https://reviews.llvm.org/D139730 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits