tra added a comment.

In D94337#2492108 <https://reviews.llvm.org/D94337#2492108>, @rgreenblatt wrote:

> In D94337#2491825 <https://reviews.llvm.org/D94337#2491825>, @tra wrote:
>
>> 'Works' is not exactly the same as 'works correctly'. This example makes 
>> `a()` look like a regular host function, instead of the kernel, and that 
>> affects how the rest of the TU get parsed.
>> I.e. you'll have further errors if somewhere down below the file has 
>> `a<<<1,1>>>()`. Similar story with ignoring `__host__` and `__device__` 
>> attributes -- that may lead to various overload resolution errors, or 
>> reporting conflicting redeclarations/redefinitions for the perfectly valid 
>> host/device function overloads. The list goes on.
>
> No, this part is definitely working, the full set of SDK headers is included. 
> I have confirmed this by compiling a .cu and a .cuh file with `-E` and 
> checking that the output is identical. Further, I have confirmed that 
> `global<<<_, _>>>()` builds, that `__host__` only functions can't be used on 
> the device, and that `__device__` functions can't be used on the host.

We were talking about the case where clang can't expand CUDA-specific macros. 
It works in your specific case because you do have CUDA installed and compiler 
did find it. It's not necessarily the case for everyone everywhere. We want 
clang to work for tooling even when CUDA SDK is not installed on the machine at 
all. My argument was that without a sufficient set of CUDA macros clang can't 
parse CUDA sources correctly. CUDA SDK is currently the only source of those 
macros. We need clang to provide its own for the cases when CUDA SDK is not 
available.

Try compiling that source with `-nocudainc` which will illustrate what happens 
when CUDA SDK is not found. E.g. https://godbolt.org/z/MoKvfq

> To fix this, I added a new type `TY_CUDAHeader_DEVICE`. This type precompiles 
> a header for a specific device architecture. 
> This type is used in place of `TY_CUDA_DEVICE` in the appropriate case. 
> I think having a header device type is the correct approach if a separate 
> header type is used.

I think this kind of compilation pipeline restructuring is a bit premature. We 
need to address being able to parse CUDA headers reliably first.

There's also a question of whether it's necessary. It's not very useful for the 
Driver functionality itself and can be implemented in the tooling by explicitly 
telling the driver what we want. Your patch appears to implement it the other 
way around -- the tooling relies on the driver to implement the CUDA header 
handling magic. I think the right approach is to teach tooling how to handle 
CUDA and keep the clang driver changes to a minimum. Based on our previous 
conversation, se seem to need only `-x cuda-header`. Everything else can be 
controlled by the tooling with `--cuda-host-only`/`--cuda-device-only 
--cuda-gpu-arch=XXX`. Considering that tooling can't deal with more than one 
sub-compilation, using these flags is going to be necessary anyways.

> @tra I am guessing you have already thought about this, but
> one thing which is worth noting is that language servers 
> only maintain a single AST per file (and this probably won't change).

Using AST from the host-side compilation was an easy-to-do trade-off. It works 
reasonably well most of the time, at least for the sources we have exact 
compilation flags for. Re-engineering tooling to deal with multiple ASTs per TU 
just for CUDA may not be worth it.

> This is the host AST of course.
> For example, in the above program language servers
> will only have a diagnostic for `#pragma message "host"`.
> So, the values of `--cuda-gpu-arch` aren't relevant for language servers
> beyond determining what the preprocessor includes.
> However, the fact that the preprocessor includes depend on arch
> means that the __clang_cuda_standalone_defs.h approach won't always be 
> perfectly
> correct.

Yes, that's exactly why I mentioned before that ideally we may need multiple 
ASTs as each device compilation will have a slightly different one.  The goal 
of `__clang_cuda_standalone_defs.h` is to make it possible to parse CUDA 
sources at all w/o having to rely on CUDA SDK. It is not intended to solve the 
multiple-AST issue.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94337/new/

https://reviews.llvm.org/D94337

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to