On Mon, 29 Jun 2020 21:32:41 +0100 Andrew Stubbs <a...@codesourcery.com> wrote:
> On 29/06/2020 21:16, Julian Brown wrote: > > Data-share write (ds_write) instructions do not necessarily complete > > the write to LDS immediately. When a write completes, LGKM_CNT is > > decremented. For now, we wait until LGKM_CNT reaches zero after each > > ds_write instruction. > > > > This fixes a race condition in the case where LDS is read > > immediately after being written. This can happen with broadcast > > operations. > > > > OK for og10 branch? > > I'm not saying no (because this issue needs a fix), but the thought > occurs that inserting one wait before the barrier might be better > than inserting a wait after each and every write. > > In particular, it seems logical that any barrier should be a memory > barrier, so inserting it in the barrier pattern is not a big deal. > IIRC, only OpenACC is using that anyway (OpenMP has explicit asm > inserts in libgomp). I'd be happier with that idea if ds_{read,write} operations were *only* used for broadcasting -- but they're not, they may also be used for (some) gang-private variables and for reduction temporaries. I don't have a test case for either of those at present demonstrating bad behaviour with no waitcnt, but I guess it's theoretically possible for there to be one, at least. The "proper" solution is a general (& "optimal") waitcnt insertion pass, I think, that works with other memory operations as well as the DS ones. Thanks, Julian