On Mon, 29 Jun 2020 21:32:41 +0100
Andrew Stubbs <a...@codesourcery.com> wrote:

> On 29/06/2020 21:16, Julian Brown wrote:
> > Data-share write (ds_write) instructions do not necessarily complete
> > the write to LDS immediately. When a write completes, LGKM_CNT is
> > decremented. For now, we wait until LGKM_CNT reaches zero after each
> > ds_write instruction.
> > 
> > This fixes a race condition in the case where LDS is read
> > immediately after being written. This can happen with broadcast
> > operations.
> > 
> > OK for og10 branch?  
> 
> I'm not saying no (because this issue needs a fix), but the thought 
> occurs that inserting one wait before the barrier might be better
> than inserting a wait after each and every write.
> 
> In particular, it seems logical that any barrier should be a memory 
> barrier, so inserting it in the barrier pattern is not a big deal.
> IIRC, only OpenACC is using that anyway (OpenMP has explicit asm
> inserts in libgomp).

I'd be happier with that idea if ds_{read,write} operations were *only*
used for broadcasting -- but they're not, they may also be used for
(some) gang-private variables and for reduction temporaries. I don't
have a test case for either of those at present demonstrating bad
behaviour with no waitcnt, but I guess it's theoretically possible for
there to be one, at least.

The "proper" solution is a general (& "optimal") waitcnt insertion
pass, I think, that works with other memory operations as well as the
DS ones.

Thanks,

Julian

Reply via email to