efriedma added inline comments.
================ Comment at: lib/CodeGen/CGBuiltin.cpp:3003 + case Builtin::BI_InterlockedCompareExchangePointer: + case Builtin::BI_InterlockedCompareExchangePointer_nf: { llvm::Type *RTy; ---------------- efriedma wrote: > efriedma wrote: > > rnk wrote: > > > mgrang wrote: > > > > rnk wrote: > > > > > Is the no fence version really equivalent to this seq_cst version > > > > > here? > > > > I don't see InterlockedCompareExchangePointer creating a fence but it > > > > should. (since it is the fence version). So maybe that needs to be > > > > fixed (and possibly other fence functions). > > > > > > > > InterlockedCompareExchangePointer_nf should not create a fence so the > > > > current implementation seems correct. > > > Hm, if we're supposed to have fences, we're probably missing them > > > everywhere. I thought the atomicrmw orderings were enough, but that shows > > > what I know about the C++ memory model. =p > > > > > > We don't generate a fence for this intrinsic when MSVC does: > > > ``` > > > unsigned char test_arm(long *base, long idx) { > > > return _interlockedbittestandreset_acq(base, idx); > > > } > > > ``` > > > > > > @efriedma, is MSVC over-emitting extra fences, or are we under-emitting? > > > If this is their "bug", should we be bug-for-bug compatible with it so we > > > get the same observable memory states? > > "create a fence" is a little confusing because the AArch64 ldaxr/stlxr have > > builtin fences... but presumably the "nf" version should use ldxr/stxr > > instead. At the IR level, that corresponds to "monotonic". > In terms of emitting fences, maybe we are actually missing a fence. > > An ldaxr+stlxr pair isn't a complete fence, at least in theory; it stops > loads from being hosted past it, and stores from being sunk past it, but > stores can be hoisted and loads can be sunk. That said, a stronger fence > isn't necessary to model C++11 atomics; C++11 only requires sequential > consistency with other sequentially consistent operations, plus an > acquire/release barrier. And a program that isn't using C++11 relaxed > atomics can't depend on a stronger barrier without triggering a data race. > > A program written before C++11 atomics were generally available could > theoretically depend on the barrier through a series of operations that cause > a data race under the C++11 rules. Since there were no clear rules before > C++11, programs did things like emulate relaxed atomics on top of volatile > pointers; using such a construct, a program could depend on the implied > barrier. For this reason, gcc emits a fence on AArch64 after `__sync_*` (but > not `__atomic_*`). And I guess MSVC does something similar for > `_Interlocked*`. > > That said, LLVM has never emitted a dmb for `__sync_*` operations on AArch64, > and it hasn't led to any practical problems as far as I know. > it stops loads from being hosted past it, and stores from being sunk past it, > but stores can be hoisted and loads can be sunk. Looking over it again, this bit is a really bad description of what actually happens. Really there are two issues: one, an operation from after an ldaxr/stlxr pair could end up betwen the ldaxr and stlxr, and two, a cmpxchg doesn't always execute the stlxr. Repository: rC Clang https://reviews.llvm.org/D52807 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits