JonChesterfield added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1204
+    {
+      auto *CTC = static_cast<const toolchains::CudaToolChain *>(
+          C.getSingleOffloadToolChain<Action::OFK_Cuda>());
----------------
Logic very like this could pick out a second, small devicertl bitcode library


================
Comment at: 
clang/lib/Headers/openmp_wrappers/__clang_openmp_devicertl_cuda_ge90.h:18
+#ifdef _OPENMP
+#define DEVICE __attribute__((used))
+#else
----------------
linkonce and linkonce_odr can both be discarded, but these symbols need to 
survive until devicertl is linked


================
Comment at: 
clang/lib/Headers/openmp_wrappers/__clang_openmp_devicertl_cuda_ge90.h:39
+  int WARPSIZE = 32;
+  int tmp = ((WARPSIZE - Width) << 8) | 0x1f;
+  return __nvvm_shfl_sync_down_i32(Mask, Var, Delta, tmp);
----------------
calling into intrinsics would remove this messy expression, if we can work out 
how to reliably compile parts of the cuda sdk as openmp

these functions probably can't be instantiated on the host, so when changing 
the devicertl build over to openmp we will also need to guard these with 
variant / macro


================
Comment at: openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip:49
 // Warp vote function
-DEVICE __kmpc_impl_lanemask_t __kmpc_impl_activemask() {
+EXTERN __kmpc_impl_lanemask_t __kmpc_impl_activemask() {
   return __builtin_amdgcn_read_exec();
----------------
Changing from c++ to c name mangling seems fine here. We don't consistently use 
one or the other in the devicertl. Using c mangling is simpler if the 
implementation is in a header.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95313/new/

https://reviews.llvm.org/D95313

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to