[clang] [llvm] [AMDGPU] Introduce asyncmark/wait intrinsics (PR #173259)

Sameer Sahasrabuddhe via cfe-commits Mon, 12 Jan 2026 21:09:53 -0800

ssahasra wrote:

> To be honest, I'm inclined to say that we _can_ just wait for asynccnt / 
> tensorcnt conservatively at true function call boundaries (i.e. in the ABI). 
> It certainly doesn't hurt ML workloads, since those don't have true function 
> calls. HPC workloads might have different opinions about that, and perhaps 
> @ssahasra's ideas around function attributes could help with that, relaxing 
> the counter waits we insert?


We could do all of that, but I just don't see how any of this is precluded by 
the "base" behaviour:

    "As long as a caller waits correctly for its own marks, anything in the 
callee cannot affect the caller."

The opposite is not required (marks in the caller as seen from inside the 
callee):
- As long as there is an asyncmark in the callee and it is waited for in the 
callee, everything before entering the callee is also waited for by definition 
(because we speak in terms of program order).
- Anything in the callee should not directly depend on marks in the caller, 
that's just unspecified behaviour.

For library writing, the following should be sufficient:

```
void foo() {
   early_work();
   asyncmark(); // could be many such occurrences
   request_async_transfers();
   asyncmark(); // say this is the N'th mark in foo()
   more_work();
   asyncmark(); // could be many such occurrences
   wait.asyncmark(N-1);
   use_data_from_async_transfers();
}
```

Anything more complicated than this needs a solid well-defined use-case. In 
fact I would go further and say that if a library exposes a function that does 
its own async loads and then waits for them to finish, they are probably doing 
it wrong. They should split that funciton into separate functions that either 
launch async transfers or wait for them. That way, the user of the function has 
much better visibility and hence control over the work being scheduled.

https://github.com/llvm/llvm-project/pull/173259
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Introduce asyncmark/wait intrinsics (PR #173259)

Reply via email to