ssahasra wrote:
> To be honest, I'm inclined to say that we _can_ just wait for asynccnt /
> tensorcnt conservatively at true function call boundaries (i.e. in the ABI).
> It certainly doesn't hurt ML workloads, since those don't have true function
> calls. HPC workloads might have different opinions about that, and perhaps
> @ssahasra's ideas around function attributes could help with that, relaxing
> the counter waits we insert?
We could do all of that, but I just don't see how any of this is precluded by
the "base" behaviour:
"As long as a caller waits correctly for its own marks, anything in the
callee cannot affect the caller."
The opposite is not required (marks in the caller as seen from inside the
callee):
- As long as there is an asyncmark in the callee and it is waited for in the
callee, everything before entering the callee is also waited for by definition
(because we speak in terms of program order).
- Anything in the callee should not directly depend on marks in the caller,
that's just unspecified behaviour.
For library writing, the following should be sufficient:
```
void foo() {
early_work();
asyncmark(); // could be many such occurrences
request_async_transfers();
asyncmark(); // say this is the N'th mark in foo()
more_work();
asyncmark(); // could be many such occurrences
wait.asyncmark(N-1);
use_data_from_async_transfers();
}
```
Anything more complicated than this needs a solid well-defined use-case. In
fact I would go further and say that if a library exposes a function that does
its own async loads and then waits for them to finish, they are probably doing
it wrong. They should split that funciton into separate functions that either
launch async transfers or wait for them. That way, the user of the function has
much better visibility and hence control over the work being scheduled.
https://github.com/llvm/llvm-project/pull/173259
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits