yaxunl added inline comments.

================
Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:12
+// RUN: llvm-link %t_0 %t_5 -o -| llvm-dis -o - | FileCheck 
-check-prefix=LINKED5 %s
+
+#include "Inputs/cuda.h"
----------------
need to test using clang -cc1 with -O3 and -mlink-builtin-bitcode to link the 
device lib and verify the load of llvm.amdgcn.abi.version being eliminated 
after optimization.

I think currently it cannot do that since llvm.amdgcn.abi.version is not 
internalized by the internalization pass. This can cause some significant perf 
drops since loading is expensive. Need to tweak the function controlling what 
variables can be internalized for amdgpu so that this variable gets 
internalized, or having a generic way to tell that function which variables 
should be internalized, e.g. by adding a metadata amdgcn.internalize


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to