[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Alex Voicu via cfe-commits Wed, 02 Jul 2025 06:11:39 -0700

AlexVlx wrote:

> > Definitely, more than happy to have a 1-on-1 (2-on-1 even, since I think 
> > @AaronBallman also suggested something along these lines as well :) ).
> 
> Please email me with some times that will work for you.
> 
> > We've just made the call to foo() illegal on anything that is not gfx9000
> 
> I... don't think I'm suggesting this? The fact that a call to foo() from a 
> __builtin_amdgcn_processor_is block shouldn't imply anything about other 
> calls to foo().
>


Perhaps I am misunderstanding, case in which I apologise. I started from: "We 
can tell, statically, that the first call is correctly guarded by an if 
statement: it's guaranteed it will never run on a non-gfx9000 processor. The 
second call, on the other hand, is not. So we can add a frontend rule: **the 
first call is legal, the second is not**." I'm saying we cannot really infer 
anything about the legality of a naked call to a builtin either, at this point. 
Because the builtin might be available on many processors / processors other 
than gfx9000. We can develop the argument to say "well, fine, what we actually 
meant here is `is_invocable`, rather than `processor_is`, and then thing work 
out", but the corollary to that appears to be that if you ever use the 
predicate on a builtin, you must touch every other use of that builtin within 
at least the same function, and relate it to the predicate evaluation.

> What I'm basically suggesting is just exposing SPIR-V specialization 
> constants as a C construct. Your example SPIR-V was something like:
> 
> ```
> %cmp = OpIEqual %bool %runtime_known_hw_id %hw_id_that_supports_feature
> if (%cmp = true) {
> /* some feature */
> } else {
> /* other feature */
> }
> ```
> 
> We want to come up with a corresponding C construct that's guaranteed to 
> compile to valid SPIR-V. My suggestion is something like:
> 
> ```
> if (__runtime_known_hw_id_eq("hw_id_that_supports_feature")) {
>   /* some feature */
> }
> ```
>

I'm confused as to what is different versus what this PR does, which is does 
generate valid SPIRV / LLVM IR. Perhaps there is an underlying assumption that 
there is some construct that makes the otherwise dead block still contain valid 
code, and there really isn't. There's an example I provided above where what is 
guarded is (static) finite resource allocation, not just the use of an 
intrinsic; we'd not know in the FE which is correct, and we cannot allocate 
both until we know the target at JIT / finalisation time (so before executing 
the code), and we cannot generate executable code with both allocation requests 
live, as the finite resource gets exhausted. So the only place where we can 
meaningfully deal with this is in the ME / over IR, before hitting the BE. We 
should be careful to avoid focusing on the `processor_is` / `hw_id` aspect, 
this leads to brittle code that has to constantly grow additional identity 
checks via `||` disjunction.

> In the body of the if statement, you can use whatever intrinsics are legal on 
> hw_id_that_supports_feature.
> 
> > we're just sliding in immediately after Clang, before optimisation
> 
> Isn't doing checks immediately after IR generation basically the same as 
> checking the AST, just on a slightly different representation?

Not in this case. There's at least two aspects that make a difference:

- linking in bitcode, which can allow more extensive analysis than what you can 
do per TU in the AST - this is minor, however please note the conversation 
above about having to be conservative around external symbols, and the risks of 
leaving them around;
- lack of information when generating the AST, when dealing with abstract 
targets like SPIRV (more specifically, AMDGCN flavoured SPIRV, for the purposes 
of this PR)
  - the FE targets `amdgcnspirv`, which is generic across all concrete AMDGPU 
targets (union of features);
  - the predicates proposed here offer customisation points for which the 
resolution is deferred to the point where the target is known;
  - we only know the concrete target when we are finalising, which happens at a 
completely different time-point, on possibly a different machine;
  - we cannot time-travel to inform the AST about this, but we can compose 
generic IR with target IR, and lower it as target IR (this is already how 
various flavours of device / offload libs work, so it's hardly novel).

None of the above matters for concrete targets, where we just resolve 
everything in the AST already, because we have full information in the FE.

https://github.com/llvm/llvm-project/pull/134016
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Reply via email to