[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Alex Voicu via cfe-commits Mon, 30 Jun 2025 07:57:14 -0700

AlexVlx wrote:

> We briefly discussed this in the clang area team meeting, and we weren't 
> really happy with the design as-is. The underlying idea behind the feature 
> makes sense, but semantics of the actual builtin is ugly: there's a loose 
> connection between the condition check, and the region of code it's guarding.
>


Whilst I am thankful for the feedback I think it is somewhat unfortunate that 
we could not have a shared discussion about this, since I think that there are 
some core misunderstandings that seem to recur, which makes forward progress 
one way or the other difficult.
 
> I spent a bit more time thinking about it after the meeting. Here's a 
> potential alternative design: we add a new kind of if statement, a 
> "processor-feature-if", spelled something like `if 
> __amdgcn_processor_is("gfx900") {}`. In the body of the if statement, you're 
> allowed to use builtins that would otherwise be illegal. This ensures a 
> direct connection between the feature check and the corresponding builtins, 
> so the frontend can analyze your usage and generate accurate diagnostics.
> 

This has been considered, and doesn't quite address the use case (without 
ending up where the currently proposed design already is). Whilst this would 
have been significantly easier to discuss directly, I will try to enumerate the 
issues here:

1. we should not be touching the HLL at such an intrinsic (pun purely 
accidental) level i.e. we should not inject bespoke keywords / control 
structures etc. - this is extremely risky and can well be weaponised into 
making arguments (Clang / LLVM have implemented this, so it is clearly the way) 
about what should be standardised, and there isn't nearly enough experience to 
warrant that; we'd also be preventing meaningful cross-compiler single source / 
forcing other compilers to implement the exact same novel control structure; 
it's easier to detect a builtin than to detect whether a FE supports a novel 
keyword / control structure;
2. In relation with the above, I think that part of the confusion here is that 
the assumption is that the use case here is  a mechanism like `__device__` i.e. 
it's just about inline specifying a blob of "device" code in some "host" 
source, which is essentially ad-hoc language subsetting / dialect generation - 
that is absolutely not the case;
3. Whilst the discussion is using `processor_is` examples, that is for ease of 
parsing, arguably the `is_invocable` check is significantly more useful, as it 
operates at the right granularity; a particular builtin might be available 
across many architectures, so checking for that rather than a particular 
processor ensures that the code will tightly adapt, without changes; we are not 
just trying to find a way to mirror `-mcpu`;
4. The front-end cannot generate accurate diagnostics for the actual 
interesting case where the target is abstract (AMDGCNSPIRV, or the `generic` 
target @jhuber6 uses in GPU libc, if we extend things in that direction), 
because there isn't enough information - we only know what the concrete target 
is, and hence what features are available, at a point in time which is 
sequenced after the front-end has finished processing (could be run-time JIT 
for SPIR-V, could be bit code linking in a completely different compilation for 
GPU libc etc.);
5. There is not watertight mechanism here in the presence of indirect function 
calls / pointers to function, unless we start infecting function types 
(otherwise stated, the ABI) with their feature requirements, which would be 
extremely unfortunate, and probably intractable (because now things like the 
dynamic linker have to start caring); there is no "safe-design with accurate 
diagnostics" that prevents some user from checking for a predicate, then 
calling, via pointer, a function they imported from a library that is utterly 
incompatible with the invariants established by the predicate; these are 
specialist tools, with sharp edges, of which we do have quite a few already.

> In the case where the target features are known during clang codegen, 
> lowering is easy: you just skip generating the bodies of the if statements 
> that don't match. If you want to some kind of "runtime" (actual runtime, or 
> SPIR-V compilation-time) detection, it's not clear what the LLVM IR should 
> look like: we only support specifying target features on a per-function 
> level. But we can look at that separately.

As I have already mentioned below, this is just duplicating existing 
functionality, possibly in a more verbose and roundabout way. It is also 
already handled by what is being proposed, hence the awareness of it was 
present when the currently proposed design was put together. The interesting 
case is the second one, so, sadly, we cannot just look at that separately (and, 
IMHO, should not come up with novel IR constructs to solve this).

If the core of the objection here is that Clang really doesn't like that we're 
doing semantic checking for these / they have too large a footprint for target 
specific builtins, I can always just delete those bits, have these return bool 
and that's that. It will force us to maintain an OOT delta, so it's not ideal, 
but if that's what it takes to make forward progress, it is what it is.

https://github.com/llvm/llvm-project/pull/134016
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Reply via email to