On 25/01/2016 19:23, Alex Bennée wrote: >>> + }) >>> + >>> +#define atomic_mb_set(ptr, i) ((void)atomic_xchg(ptr, i)) >>> +#define atomic_mb_read(ptr) \ >>> + ({ \ >>> + typeof(*ptr) _val; \ >>> + __atomic_load(ptr, &_val, __ATOMIC_ACQUIRE); \ >>> + _val; \ >>> + }) >> >> Please leave these defined in terms of relaxed accesses and memory >> barriers. > > May I ask why? Doesn't the __ATOMIC_ACQUIRE ensure the load barrier is > in place? Or should it be a fill CST barrier?
First, because they are defined like this in docs/atomics.txt. :) Second, because the right definition would be __atomic_load(__ATOMIC_SEQ_CST) and __atomic_store(__ATOMIC_SEQ_CST). These produce even better code than the memory barriers on x86 but, unfortunately, on POWER they are implemented in such a way that make things very slow. Basically, instead of mb_read -> load; rmb() mb_set -> wmb(); store(); mb() POWER uses this: mb_read -> load; mb() mb_set -> wmb(); store(); mb() There are reasons for these, and they are proved in http://www.cl.cam.ac.uk/~pes20/cppppc/popl079-batty.pdf (I'm not pretending I understand this paper except inasmuch as it was necessary to write docs/atomics.txt). However, the cases that actually matter basically never arise. Again, this is documented in docs/atomics.txt around line 90. As an alternative to explicit memory barriers, you can use this on POWER: mb_read -> load-acquire mb_set -> store-release + mb() and seq-cst load/store everywhere else. Paolo