efriedma added inline comments.

================
Comment at: lib/CodeGen/CGBuiltin.cpp:3003
+  case Builtin::BI_InterlockedCompareExchangePointer:
+  case Builtin::BI_InterlockedCompareExchangePointer_nf: {
     llvm::Type *RTy;
----------------
efriedma wrote:
> efriedma wrote:
> > rnk wrote:
> > > mgrang wrote:
> > > > rnk wrote:
> > > > > Is the no fence version really equivalent to this seq_cst version 
> > > > > here?
> > > > I don't see InterlockedCompareExchangePointer creating a fence but it 
> > > > should. (since it is the fence version). So maybe that needs to be 
> > > > fixed (and possibly other fence functions).
> > > > 
> > > > InterlockedCompareExchangePointer_nf should not create a fence so the 
> > > > current implementation seems correct.
> > > Hm, if we're supposed to have fences, we're probably missing them 
> > > everywhere. I thought the atomicrmw orderings were enough, but that shows 
> > > what I know about the C++ memory model. =p
> > > 
> > > We don't generate a fence for this intrinsic when MSVC does:
> > > ```
> > > unsigned char test_arm(long *base, long idx) {
> > >   return _interlockedbittestandreset_acq(base, idx);
> > > }
> > > ```
> > > 
> > > @efriedma, is MSVC over-emitting extra fences, or are we under-emitting? 
> > > If this is their "bug", should we be bug-for-bug compatible with it so we 
> > > get the same observable memory states?
> > "create a fence" is a little confusing because the AArch64 ldaxr/stlxr have 
> > builtin fences... but presumably the "nf" version should use ldxr/stxr 
> > instead.  At the IR level, that corresponds to "monotonic".
> In terms of emitting fences, maybe we are actually missing a fence.
> 
> An ldaxr+stlxr pair isn't a complete fence, at least in theory; it stops 
> loads from being hosted past it, and stores from being sunk past it, but 
> stores can be hoisted and loads can be sunk.  That said, a stronger fence 
> isn't necessary to model C++11 atomics; C++11 only requires sequential 
> consistency with other sequentially consistent operations, plus an 
> acquire/release barrier.  And a program that isn't using C++11 relaxed 
> atomics can't depend on a stronger barrier without triggering a data race.
> 
> A program written before C++11 atomics were generally available could 
> theoretically depend on the barrier through a series of operations that cause 
> a data race under the C++11 rules.  Since there were no clear rules before 
> C++11, programs did things like emulate relaxed atomics on top of volatile 
> pointers; using such a construct, a program could depend on the implied 
> barrier.  For this reason, gcc emits a fence on AArch64 after `__sync_*` (but 
> not `__atomic_*`).  And I guess MSVC does something similar for 
> `_Interlocked*`.
> 
> That said, LLVM has never emitted a dmb for `__sync_*` operations on AArch64, 
> and it hasn't led to any practical problems as far as I know.
> it stops loads from being hosted past it, and stores from being sunk past it, 
> but stores can be hoisted and loads can be sunk.

Looking over it again, this bit is a really bad description of what actually 
happens. Really there are two issues: one, an operation from after an 
ldaxr/stlxr pair could end up betwen the ldaxr and stlxr, and two, a cmpxchg 
doesn't always execute the stlxr.


Repository:
  rC Clang

https://reviews.llvm.org/D52807



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to