gtbercea added a comment.

The error is related to lack of device linking, just like you explained two 
paragraphs down. This is the error I get:

  main.o: In function `__cuda_module_ctor':
  main.cu:(.text+0x674): undefined reference to 
`__cudaRegisterLinkedBinary__nv_c5b75865'

You nailed the problem on the head: the device linking step is the tricky bit.

The OpenMP toolchain has the advantage that it already calls NVLINK (upstreamed 
a long time ago). This patch doesn't change that. This patch "fixes" (for a 
lack of a better word) the way in which objects are created on the device side. 
By adding the FATBINARY + CLANG++ steps to the device toolchain, I ensure that 
the existing call to NVLINK will be able to "detect" the device-part of 
individual or archived objects. This is not a valid statement in today's 
compiler in which NVLINK would not be able to do so with archived objects 
(static libs).

In general, for offloading toolchains, I don't see the reliance on vendor 
specific tools as a problem **if and only if** the calls to vendor-specific 
tools remain confined to a device-speicifc toolchain. This patch respects this 
condition. All the calls to CUDA tools in this patch are part of the OpenMP 
NVPTX device offloading toolchain (which is an NVPTX device specific toolchain).

The only host-side change is the call to "ld -r" which replaces a call to the 
"openmp-offload-bundler" tool.


Repository:
  rC Clang

https://reviews.llvm.org/D47394



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to