[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

Artem Belevich via Phabricator via cfe-commits Wed, 22 Jun 2022 14:38:16 -0700

tra added a comment.

In D127901#3602771 <https://reviews.llvm.org/D127901#3602771>, @jdoerfert wrote:


> Do we want/need PTX, I do not, but I don't mind having it. Someone will ask 
> for it eventually.

Fair enough.

> However, if we embed bitcode via LTO we can use the
> single-linked PTX image for the whole module and include it in the
> fatbinary. This allows us to do the following and have it execute even
> without the correct architecture specified.
> `clang foo.cu -foffload-lto -fgpu-rdc --offload-new-driver -lcudart`

Then we do need a knob controlling whether we do want to embed PTX or not. The 
default should be "off" IMO.
We currently have `--[no-]cuda-include-ptx=` we may reuse for that purpose.

This brings another question -- which GPU variant will we generate PTX for? 
One? All (if more than one is specified)? The ones specified by 
`--[no-]cuda-include-ptx=` ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127901/new/

https://reviews.llvm.org/D127901

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

Reply via email to