[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Samuel Antao via Phabricator via cfe-commits Tue, 29 May 2018 08:37:11 -0700

sfantao added a comment.

@gtbercea, thanks for the patch.


> INTEROPERABILITY WITH OTHER COMPILERS: These bundled object files can end up 
> being passed between Clang and other compilers which may lead to 
> incompatibilities: passing a bundled file from Clang to another compiler 
> would lead to that compiler not being able to unbundle it. Passing an 
> unbundled object file to Clang and therefore Clang not knowing that it 
> doesn't need to unbundle it.

I don't understand the interoperability issue here. Clang convention to create 
the offloading records in the host binary is specific to clang and was 
discussed, defined and documented. We do not expect other compilers to 
understand it as we don't expect clang to understand that from other compilers. 
In terms of the binary itself, it is an ELF file that can be understood by 
other compilers/tools - albeit the linking would probably fail if the device 
section addresses are not defined.

I think that what you mean here is that by forcing nvcc as the host linker when 
one of the targets is a NVPTX target, you shift the (un)bundling part to nvlink 
and you will be compatible with host compilers supported by nvcc.

I understand that at the moment there is no immediate need for combining 
multiple targets, and there is an immediate need to support archives with 
offloading code. Therefore, changing the host linker in function of the offload 
target seems reasonable as that is what would help most users.

I just want to note that the separation of the toolchains in the driver and the 
support for multiple toolchains in the same binary were part of the reason we 
converged to the current design. The idea was to have a generic driver with all 
the details relative to bundling formats handled by a separate tool, the 
bundler. Of course the requirements can be reviewed at any time and priorities 
can change. However, I think it would be cleaner to have the nvlink compatible 
binary generated by the bundler in the current design. Just my two cents.



================
Comment at: include/clang/Driver/Compilation.h:126
+  /// Whether the clang-offload-bundler can be skipped.
+  bool skipOffloadBundler = false;
+
----------------
Use CamelCase for class local variables.


================
Comment at: include/clang/Driver/Compilation.h:312
+  /// \param skipBundler - bool value set once by the driver.
+  void setSkipOffloadBundler(bool skipBundler);
+
----------------
Why is this a property of the compilation and not of a set of actions referring 
to a given target? That would allow one to combine in the same compilation 
targets requiring the bundler and targets that wouldn't. 


================
Comment at: lib/Driver/Compilation.cpp:276
+void Compilation::setSkipOffloadBundler(bool skipBundler) {
+  skipOffloadBundler = skipBundler;
+}
----------------
Given the logic you have below, you are assuming this is not set to false ever. 
It would be wise to get an assertion here in case you end up having toolchains 
skipping and others don't. If that is just not supported a diagnostic should be 
added instead.

The convention is that local variables use CamelCase.


================
Comment at: lib/Driver/Driver.cpp:2927
+    if (OpenMPTargets->getValues().size() > 0) {
+      unsigned triplesRequiringBundler = 0;
+      for (const char *Val : OpenMPTargets->getValues()) {
----------------
CamelCase


================
Comment at: lib/Driver/Driver.cpp:2943
+    }
+  }
+
----------------
Can you just implement this check in the definition of `Compilation: 
canSkipClangOffloadBundler` and get rid of `setSkipOffloadBundler`? All the 
requirted information is already in `Compilation` under `C.getInputArgs()`.


================
Comment at: lib/Driver/Driver.cpp:2962
+    // is ld instead but this needs to be replaced.
+    canDoPartialLinking = LinkerName.endswith("/ld");
+  }
----------------
In the current implementation there is no distinction between what is meant for 
Windows/Linux. This check would only work on Linux and the test below would 
fail for bots running windows.

Also, I think it makes more sense to have this check part of the `Toolchain` 
instead of doing it in the driver. The `Toolchain` definition knows the names 
of the third-party executables, the driver doesn't. 


================
Comment at: lib/Driver/ToolChains/Clang.cpp:5504
+  StringRef LinkerName = getToolChain().GetLinkerPath();
+  assert(LinkerName.endswith("/ld") && "Partial linking not supported.");
+
----------------
I believe this check should be done when the toolchain is created with all the 
required diagnostics. What happens if the linker does not support partial 
linking?


================
Comment at: lib/Driver/ToolChains/Cuda.cpp:536
+  }
 }
 
----------------
What prevents all this from being done in the bundler? If I understand it 
correctly, if the bundler implements this wrapping all the checks for 
librariers wouldn't be required and, only two changes would be required in the 
driver:

- generate fatbin instead of cubin. This is straightforward to do by changing 
the device assembling job. In terms of the loading of the kernels by the device 
API, doing it through fatbin or cubin should be equivalent except that fatbin 
enables storing the PTX format and JIT for newer GPUs.
- Use NVIDIA linker as host linker.

This last requirement could be problematic if we get two targets attempting  to 
use different (incompatible linkers). If we get this kind of incompatibility we 
should get the appropriate diagnostic.


================
Comment at: test/Driver/openmp-offload.c:497
 // RUN:   %clang -###  -fopenmp=libomp -o %t.out -lsomelib -target 
powerpc64le-linux 
-fopenmp-targets=powerpc64le-ibm-linux-gnu,x86_64-pc-linux-gnu %t.i 
-no-canonical-prefixes 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-UBJOBS %s
 // RUN:   %clang -### -fopenmp=libomp -o %t.out -lsomelib -target 
powerpc64le-linux 
-fopenmp-targets=powerpc64le-ibm-linux-gnu,x86_64-pc-linux-gnu %t.i -save-temps 
-no-canonical-prefixes 2>&1 \
----------------
We need a test for the static linking. The host linker has to be nvcc in that 
case, right?


Repository:
  rC Clang

https://reviews.llvm.org/D47394



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Reply via email to