tra added a comment. LGTM overall, with few nits.
================ Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:448-450 + // If we are invoking `nvlink` internally we need to output a `.cubin` file. + // Checking if the output is a temporary is the cleanest way to determine + // this. Putting this logic in `getInputFIlename` isn't an option because it ---------------- Can't say that I like this approach. It heavily relies on "happens to work". Perhaps a better way to deal with this is to create the temporary GPU object file name with ".cubin" extension to start with. ================ Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:450 + // Checking if the output is a temporary is the cleanest way to determine + // this. Putting this logic in `getInputFIlename` isn't an option because it + // relies on the compilation. ---------------- typo: getInputFilename ================ Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:469 - bool Relocatable = false; + bool Relocatable = true; if (JA.isOffloading(Action::OFK_OpenMP)) ---------------- Nit: I'd add an `else { Relocatable = false; // comment on what use cases use relocatable compilation by default. }` and leave it uninitialized here. At the very least it may be worth a comment. Silently defaulting to `true` makes me ask "what other compilation modes we may have?" and stand-alone compilation targeting NVPTX is hard to infer here from the surrounding details. ================ Comment at: clang/lib/Driver/ToolChains/Cuda.h:196-197 + + void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args) const override; ---------------- Nit: it's hard to tell whether the whitespace additions are spaces or tabs. They show up as ">" to me which suggests it may be tabs. Just in case it is indeed the case, please make sure to un-tabify the changes. ================ Comment at: clang/test/Driver/cuda-cross-compiling.c:37-38 +// ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s" +// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c" +// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin" + ---------------- This may fail on windows where ptxas/nvlink will be `ptxas.exe` `nvlink.exe`. I think we typically use something like `fatbinary{{.*}}"` in other tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D140158/new/ https://reviews.llvm.org/D140158 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits