rjmccall added inline comments.
================
Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+ if (isBetterMultiversionCandidate(Cand1, Cand2))
+ return true;
+
----------------
erichkeane wrote:
> yaxunl wrote:
> > echristo wrote:
> > > rjmccall wrote:
> > > > yaxunl wrote:
> > > > > rjmccall wrote:
> > > > > > If we move anything below this check, it needs to figure out a
> > > > > > tri-state so that it can return false if `Cand2` is a better
> > > > > > candidate than `Cand1`. Now, that only matters if multiversion
> > > > > > functions are supported under CUDA, but if you're relying on them
> > > > > > not being supported, that should at least be commented on.
> > > > > multiversion host functions is orthogonal to CUDA therefore should be
> > > > > supported. multiversion in device, host device, and global functions
> > > > > are not supported. However this change does not make things worse,
> > > > > and should continue to work if they are supported.
> > > > >
> > > > > host/device based overloading resolution is mostly for determining
> > > > > viability of a function. If two functions are both viable, other
> > > > > factors should take precedence in preference. This general rule has
> > > > > been taken for cases other than multiversion, I think it should also
> > > > > apply to multiversion.
> > > > >
> > > > > I will make isBetterMultiversionCandidate three states.
> > > > > This general rule has been taken for cases other than multiversion, I
> > > > > think it should also apply to multiversion.
> > > >
> > > > Well, but the multiversion people could say the same: that
> > > > multiversioning is for picking an alternative among otherwise-identical
> > > > functions, and HD and H functions are not otherwise-identical.
> > > >
> > > > CC'ing @echristo for his thoughts on the right ordering here.
> > > Adding @erichkeane here as well.
> > >
> > > I think this makes sense, but I can see a reason to multiversion a
> > > function that will run on host and device. A version of some matrix mult
> > > that takes advantage of 3 host architectures and one cuda one? Am I
> > > missing something here?
> > My understanding is that a multiversion function is for a specific
> > cpu(gpu). Let's say we want to have a function f for gfx900, gfx906,
> > sandybridge, ivybridge, shouldn't they be more like
> >
> > ```
> > __host__ __attribute__((cpu_specific(sandybridge))) f();
> > __host__ __attribute__((cpu_specific(ivybridge))) f();
> > __device__ __attribute__((cpu_specific(gfx900))) f();
> > __device__ __attribute__((cpu_specific(gfx906))) f();
> > ```
> > instead of all `__device__ __host__` functions?
> IMO, it doesn't make sense for functions to functions be BOTH host and
> device, they'd have to be just one. Otherwise I'm not sure how the resolver
> behavior is supposed to work. The whole idea is that the definition is
> chosen at runtime.
>
> Unless __host__ __device void foo(); is TWO declaration chains (meaning two
> separate AST entries), it doesn't make sense to have multiverison on it (and
> then, how it would be spelled is awkward/confusing to me).
>
> In the above case, if those 4 declarations are not 2 separate root- AST
> nodes, multiversioning won't work.
There are certainly functions that ought to be usable from either host or
device context — any inline function that just does ordinary language things
should be in that category. Also IIUC many declarations are *inferred* to be
`__host__ __device__`, or can be mass-annotated with pragmas, and those reasons
are probably the main ones this might matter — we might include a header in
CUDA mode that declares a multi-versioned function, and we should handle it
right.
My read of how CUDA programmers expect this to work is that they see the
`__host__` / `__device__` attributes as primarily a mechanism for catching
problems where you're using the wrong functions for the current configuration.
That is, while we allow overloading by `__host__`/`__device__`-ness, users
expect those attributes to mostly be used as a filter for what's "really there"
rather than really strictly segregating the namespace. So I would say that
CUDA programmers would probably expect the interaction with multiversioning to
be:
- Programmers can put `__host__`, `__device__`, or both on a variant depending
on where it was usable.
- Dispatches should simply ignore any variants that aren't usable for the
current configuration.
And specifically they would not expect e.g. a `__host__` dispatch function to
only consider `__host__` variants — it should be able to dispatch to anything
available, which is to say, it should also include `__host__ __device__`
variants. Similarly (and probably more usefully), a `__host__ __device__`
dispatch function being compiled for the device should also consider pure
`__device__` functions, and so on.
If we accept that, then I think it gives us a much better idea for how to
resolve the priority of the overload rules. The main impact of
`isBetterMultiversionCandidate` is to try to ensure that we're looking at the
`__attribute__((cpu_dispatch))` function instead of one of the
`__attribute__((cpu_specific))` variants. (It has no effect on
`__attribute__((target))` multi-versioning, mostly because it doesn't need to:
target-specific variants don't show up in lookup with
`__attribute__((target))`.) That rule should take precedence over the CUDA
preference for exact matches, because e.g. if we're compiling this:
```
__host__ __device__ int magic(void) __attribute__((cpu_dispatch("...")));
__host__ __device__ int magic(void) __attribute__((cpu_specific(generic)));
__host__ int magic(void) __attribute__((cpu_specific(mmx)));
__host__ int magic(void) __attribute__((cpu_specific(sse)));
__device__ int magic(void) __attribute__((cpu_specific(some_device_feature)));
__device__ int magic(void)
__attribute__((cpu_specific(some_other_device_feature)));
```
then we don't want the compiler to prefer a CPU-specific variant over the
dispatch function just because one of the variant was marked `__host__`.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77954/new/
https://reviews.llvm.org/D77954
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits