tianshilei1992 added a comment. In general we're moving to the direction that target specific implementation will be compiled along with user code, which is fantastic. In this way, we only need to provide one bitcode library for one target. The change in FE lacks of some efficiency. If user code has multiple files, target specific header will be included multiple times, thus compiled multiple times. A more efficient way is to change the workflow of the driver, probably in the following way:
1. Compile target implementation `t.bc` 2. Link `t.bc` and `libomptarget-[arch].bc` to `libomptarget.bc` 3. Compile user code, which is also multiple steps. `libomptarget.bc` is fed into FE in this step. 4. Remaining steps... ================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1204 + { + auto *CTC = static_cast<const toolchains::CudaToolChain *>( + C.getSingleOffloadToolChain<Action::OFK_Cuda>()); ---------------- JonChesterfield wrote: > Logic very like this could pick out a second, small devicertl bitcode library can we just use one header with different macros, like what we're using now? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D95313/new/ https://reviews.llvm.org/D95313 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits