nhaehnle wrote: > Would the following statement be correct with respect to function calls? > > > It is undefined behavior for a function (other than a program entry point) > > to include async mark operations without corresponding async wait > > operations, or for such a function to call async operations without calling > > asyncmark. Equivalently, no function may cause a net increase in the number > > of outstanding async operations. This allows us to assume that function > > calls cannot invalidate a local analysis of what hardware counter or > > counters must be waited on and what their values must decrease to in order > > to implement a given asyncwait > > I know there's been a lot of fiddly points here, but I think this is the > cleanest approach for something initial and we can relax that restriction > later if needed.
Given how we typically implement the low-level libraries, we can't say that taken literally. Presumably there's going to be some "nice-looking" way to spell asyncmark and wait.asyncmark in e.g. HIP, and so there will be LLVM IR with actual function calls that have an "unbalanced" set of asyncmark / wait.asyncmark. Obviously those calls are expected to be inlined by the time they reach the backend (at least in an optimized build), but our definition of LLVM IR has to apply consistently. Point is, we need to distinguish between function calls in LLVM IR vs. "true function calls" that survive through the backend. To be honest, I'm inclined to say that we *can* just wait for asynccnt / tensorcnt conservatively at true function call boundaries (i.e. in the ABI). It certainly doesn't hurt ML workloads, since those don't have true function calls. HPC workloads might have different opinions about that, and perhaps @ssahasra's ideas around function attributes could help with that, relaxing the counter waits we insert? https://github.com/llvm/llvm-project/pull/173259 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
