yaxunl added a comment.

In D101630#2729975 <https://reviews.llvm.org/D101630#2729975>, @tra wrote:

> What will happen with this patch in the following scenarios:
>
> - `--offload_arch=A -S -o out.s`
> - `--offload_arch=A --offload-arch=B -S -o out.s`
>
> I would expect the first case to produce a plain text assembly file. With 
> this patch the second case will produce a bundle. With some build tools users 
> only add to the various compiler options provided by the system. Depending on 
> whether those system-provided options include an `--offload-arch`, the format 
> of the output in the first example becomes unstable. So the consistent way 
> would be to always bundle everything, but that breaks (or at least 
> complicates) the normal single-output case and makes it deviate from what 
> users expect from a regular C++ compilation.
>
> In D101630#2729768 <https://reviews.llvm.org/D101630#2729768>, @yaxunl wrote:
>
>> We use ccache and need one output for -E with device compilation. Also there 
>> are use cases to emit bitcode for device compilation and link them later. 
>> These use cases require output to be bundled.
>
> This is a good point. I don't think I've ever used ccache on a CUDA 
> compilation, but I see how ccache may get surprised.
>
> Considering the scenario above, I think a better way to handle it would be to 
> teach ccache about CUDA/HIP compilation. It's a similar situation with 
> support for split DWARF, when compiler does something beyond the expected 
> one-input to one-output transformation.
> E.g. we could tell it to use stdout for `-E`. Or implement the 
> `bundle-everything` flag in clang and let ccache use it if it needs to have a 
> single output.
>
>> If users want to get the unbundled output, they can use -save-temps. Is it 
>> sufficient?
>
> In terms of saving intermediate outputs - yes. In terms of usability - no. 
> Sometimes I want one particular intermediate result saved with exact filename 
> (or piped to stdout) and saving bunch and then picking one would be a pretty 
> annoying usability regression for me.

How about an option -fhip-bundle-device-output. If it is on, device output is 
bundled no matter how many GPU arch there are. By default it is on.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to