[PATCH] D101630: [HIP] Fix device-only compilation

Artem Belevich via Phabricator via cfe-commits Tue, 01 Jun 2021 14:24:20 -0700

tra added a comment.

In D101630#2792052 <https://reviews.llvm.org/D101630#2792052>, @yaxunl wrote:


> I think for intermediate outputs e.g. preprocessor expansion, IR, and 
> assembly, probably it makes sense not to bundle by default.

Agreed.

> However, for default action (emitting object), we need to bundle by default 
> since it was the old behavior and existing HIP apps depend on that.

Existing use is a valid point.
As a counterargument, I would suggest that in a compilation pipeline which does 
include bundling, an object file for one GPU variant *is* an intermediate 
output, similar to the ones you've listed above.

The final product of device-side subcompilations is a bundle. The question is 
`what does "-c" mean?`.  Is it `produce an object file` or `compile till the 
end of the pipeline` ? 
For CUDA and HIP compilation it's ambiguous. When we target just one GPU, it 
would be closer to the former. In general, it would be closer to the latter. 
NVCC side-steps the issue by using a different flags `-cubin/-fatbin` to 
disambiguate between two cases and avoid bolting on CUDA-related semantics on 
the compiler flags that were not designed for that.

> Then we allow -fhip-bundle-device-output to override the default behavior.

OK. Bundling objects for HIP by default looks like a reasonable compromise. 
It would be useful to generalize the flag to `-fgpu-bundle...` as it would be 
useful if/when we want to produce a fatbin during CUDA compilation. I'd still 
keep no-bundling as the default for CUDA's objects.

Now that we are in agreement of what we want, the next question is *how* we 
want to do it.

It appears that there's a fair bit of similarity between what the proposed 
`-fgpu-bundle` flag does and the handful of `--emit-...` options clang has now.
If we were to use something like `--emit-gpu-object` and `--emit-gpu-bundle`, 
it would be similar to NVCC's `-cubin/-fatbinary`, would decouple the default 
behavior for `-c --cuda-device-only` from the user's ability to specify what 
they want without burdening `-c` with additional flags that would have 
different defaults under different circumstances.

Compilation with "-c" would remain the "compile till the end", whatever it 
happens to mean for particular language and `--emit-object/bundle` would tell 
the compiler how far we want it to proceed and what kind of output we want. 
This would probably be easier to explain to the users as they are already 
familiar with flags like `-emit-llvm`, only now we are dealing with an extra 
bundling step in the compilation pipeline. It would also behave consistently 
across CUDA and HIP even though they have different defaults for bundling for 
the device-side compilation. E.g. `-c --cuda-device-only --emit-gpu-bundle` 
will always produce a bundle with the object files for both CUDA and HIP and 
`-c --cuda-device-only --emit-gpu-object` will always require single '-o' 
output.

WDYT? Does it make sense?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D101630: [HIP] Fix device-only compilation

Reply via email to