yaxunl added a comment. In D101630#2791734 <https://reviews.llvm.org/D101630#2791734>, @tra wrote:
> In D101630#2787714 <https://reviews.llvm.org/D101630#2787714>, @yaxunl wrote: > >> How does nvcc --genco behave when there are multiple GPU arch's? Does it >> output a fat binary containing multiple ISA's? Also, does it support >> device-only compilation for intermediate outputs? > > It does not allow multiple outputs for `-ptx` and `-cubin` compilations, same > as clang behaves now: > > $ ~/local/cuda-11.3/bin/nvcc -gencode=arch=compute_60,code=sm_60 > -gencode=arch=compute_70,code=sm_70 -ptx foo.cu > nvcc fatal : Option '--ptx (-ptx)' is not allowed when compiling for > multiple GPU architectures > > NVCC does allow `-E` with multiple targets, but it does produce output for > only *one* of them. > > NVCC does bundle outputs for multiple GPU variants if `-fatbin` is used. I think for intermediate outputs e.g. preprocessor expansion, IR, and assembly, probably it makes sense not to bundle by default. However, for default action (emitting object), we need to bundle by default since it was the old behavior and existing HIP apps depend on that. Then we allow -fhip-bundle-device-output to override the default behavior. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D101630/new/ https://reviews.llvm.org/D101630 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits