yaxunl accepted this revision.
yaxunl added a comment.

LGTM. Thanks



================
Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12
+// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck 
-check-prefix=LINKED5 %s
+
+#include "Inputs/cuda.h"
----------------
saiislam wrote:
> yaxunl wrote:
> > need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link 
> > the device lib and verify the load of llvm.amdgcn.abi.version being 
> > eliminated after optimization.
> > 
> > I think currently it cannot do that since llvm.amdgcn.abi.version is not 
> > internalized by the internalization pass. This can cause some significant 
> > perf drops since loading is expensive. Need to tweak the function 
> > controlling what variables can be internalized for amdgpu so that this 
> > variable gets internalized, or having a generic way to tell that function 
> > which variables should be internalized, e.g. by adding a metadata 
> > amdgcn.internalize
> load of llvm.amdgcn.abi.version is being eliminated with cc1, -O3, and 
> mlink-builtin-bitcode of device lib.
It seems being eliminated by IPSCCP. It makes sense since it is constant 
weak_odr without externally_initialized. Either changing it to weak or adding 
externally_initialized will keep the load. Normal `__constant__` var in device 
code may be changed by host code, therefore they are emitted with 
externally_initialized and do not have the load eliminated.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to