On Tue, Mar 11, 2014 at 09:12:53PM +0000, John Carr wrote: > Will Deacon <will.dea...@arm.com> wrote: > > On Tue, Mar 11, 2014 at 02:54:18AM +0000, John Carr wrote: > > > A comment in arm/sync.md notes "We should consider issuing a inner > > > shareability zone barrier here instead." Here is my first attempt > > > at a patch to emit weaker memory barriers. Three instructions seem > > > to be relevant for user mode code on my Cortex A9 Linux box: > > > > > > dmb ishst, dmb ish, dmb sy > > > > > > I believe these correspond to a release barrier, a full barrier > > > with respect to other CPUs, and a full barrier that also orders > > > relative to I/O. > > > > Not quite; DMB ISHST only orders writes with other writes, so loads can move > > across it in both directions. That means it's not sufficient for releasing a > > lock, for example. > > Release in this context doesn't mean "lock release". I understand > it to mean release in the specific context of the C++11 memory model. > (Similarly, if you're arguing standards compliance "inline" really > means "relax the one definition rule for this function.") > > I don't see a prohibition on moving non-atomic loads across a release > store. Can you point to an analysis that shows a full barrier is needed?
Well, you can use acquire/release to implement a lock easily enough. For example, try feeding the following to cppmem: int main() { int x = 0, y = 0; atomic_int z = 0; {{{ { r1 = x; y = 1; z.store(1, memory_order_release); } ||| { r0 = z.load(memory_order_acquire).readsvalue(1); r1 = y; x = 1;} }}} return 0; } There is one consistent execution, which requires the first thread to have r1 == 0 (i.e. read x as zero) and the second thread to have r1 == 1 (i.e. read y as 1). If we implement store-release using DMB ISHST, the assembly code would look something like the following (I've treated the atomic accesses like normal load/store instructions for clarity, since they don't affect the ordering here): T0: LDR r1, [x] STR #1, [y] DMB ISHST STR #1, [z] T1: LDR r0, [z] // Reads 1 DMB ISH LDR r1, [y] STR #1, [x] The problem with this is that the LDR in T0 can be re-ordered *past* the rest of the sequence, potentially resulting in r1 == 1, which is forbidden. It's just like reading from a shared, lock-protected data structure without the lock held. > If we assume that gcc is used to generate code for processes running > within a single inner shareable domain, then we can start by demoting > "dmb sy" to "dmb ish" for the memory barrier with no other change. I'm all for such a change. > If a store-store barrier has no place in the gcc atomic memory model, > that supports my hypothesis that a twisty maze of ifdefs is superior to > a "portable" attractive nuisance. I don't understand your point here. Will