tra added a subscriber: wash.
tra added a comment.

In D79526#2027470 <https://reviews.llvm.org/D79526#2027470>, @yaxunl wrote:

> For implicit host device functions, since they are not guaranteed to work in 
> device compilation, we can only resolve them as if they are host functions. 
> This causes asymmetry but implicit host device functions are originally host 
> functions so it is biased toward host compilation in the beginning.


I don't think that the assertion that `implicit host device functions are 
originally host functions` is always true. While in practice most such 
functions may indeed come from the existing host code (e.g. the standard 
library), I don't see any inherent reason why they can't come from the code 
written for GPU. E.g. thrust is likely to have some implicitly HD functions in 
the code that was not intended for CPUs and your assumption will be wrong. Even 
if such case may not exist now, it would not be unreasonable for users to have 
such code on device. 
This overload resolution difference is observable and it will likely create new 
corner cases in convoluted enough C++ code.

I think we need something more principled than "happens to work for existing 
code".

> Only the original resolution guarantees no other issues.  For example, in the 
> failed compilation in TF, some ctor of std::atomic becomes implicit host 
> device function because it is constexpr. We should treated as wrong-sided in 
> device compilation, but we should treated as same-sided in host compilation, 
> otherwise it changes the resolution in host compilation and causes other 
> issues.

It may be true for atomic, where we do need to have GPU-specific 
implementation. However, I can also see classes with constexpr constructors 
that are prefectly usable on both sides and do not have to be treated as the 
wrong-side.

TBH, I do not see any reasonable way to deal with this with the current 
implementation of how HD functions are treated. This patch and its base do 
improve things somewhat, but it all comes at the cost of further complexity and 
potentially paints us even deeper into a corner. Current behavior is already 
rather hard to explain.

Some time back @wash from NVIDIA was asking about improving HD function 
handling. Maybe it's time for all interested parties to figure out whether it's 
time to come up with a better solution. Not in this patch, obviously.



================
Comment at: clang/test/SemaCUDA/function-overload.cu:471-477
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
----------------
These tests only veryfy that the code compiled, but it does not guarantee that 
we've picked the correct overload.
You should give callees different return types and assign the result to a 
variable of intended type.  See `test_host_device_calls_hd_template() ` on line 
341 for an example.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to