[powerpc64le] seq_cst memory order possibly not honored
Hi, I'm having a problem with one of the Boost.Atomic tests on a PowerPC64 LE test platform. The test is running two threads which are looping code like this: Thread 1 Thread 2 [initially a == 0 && b == 0] a.store(1, seq_cst); b.store(1, seq_cst); a.load(relaxed); b.load(relaxed); x = b.load(relaxed); y = a.load(relaxed); On each iteration the test verifies that !(x == 0 && y == 0) and this check fails. As far as I can tell the test is valid and it indeed passes on x86. Boost.Atomic uses __atomic* intrinsics in both cases, so from C++ perspective the implementation is the same. I don't have the access to the tester that fails, so I can't tell the exact GCC version that is used there, only that it is labeled as 6.0. I did some experimenting on the version 4.9.2 that I have locally and found out that for code like this: __atomic_store_n(&n, 1, __ATOMIC_SEQ_CST); gcc generates this assembly: 1c: 01 00 40 39 li r10,1 20: ac 04 00 7c sync 24: 00 00 49 91 stw r10,0(r9) I would expect that for seq_cst there should be a second 'sync' right after the store, but it is absent. If 6.0 generates the similar code then this might explain the test failures I'm seeing. So my questions are: 1. Is my test valid or is there a flaw that I'm missing? 2. Am I correct about the missing 'sync' instruction? Thanks. PS: If you're interested, here's the full code snippet I compiled: int n = 0; int main() { __atomic_store_n(&n, 1, __ATOMIC_SEQ_CST); return n; } The command line was: powerpc64le-linux-gnu-g++ -g -O0 -o seq_cst_ppc64el.o -c ./seq_cst.cpp And here's the link to the original Boost.Atomic test: https://github.com/boostorg/atomic/blob/develop/test/ordering.cpp
Re: [powerpc64le] seq_cst memory order possibly not honored
On 14.08.2015 11:51, Jonathan Wakely wrote: On 14 August 2015 at 01:37, Andrey Semashev wrote: 1. Is my test valid or is there a flaw that I'm missing? The cppmem tool at http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/ shows that there are consistent executions where (x==0 && y==0) is true. I used this code: int main() { atomic_int a = 0; atomic_int b = 0; int x, y; {{{ { a.store(1,mo_seq_cst); a.load(mo_relaxed); x = b.load(mo_relaxed); } ||| { b.store(1, mo_seq_cst); b.load(mo_relaxed); y = a.load(mo_relaxed); } }}} return 0; } I'm not familiar with that tool and not sure how to interpret its output, but I think it misses one point that is present in the original test and I failed to mention. The Boost.Atomic test uses Boost.Thread barriers to ensure that both threads have executed the store-load-load region before checking for x and y. Otherwise I cannot see how (x==0 && y==0) could happen. The last load in each thread is sequenced after the first seq_cst store and both stores are ordered with respect to each other, so one of the threads should produce 1.
Re: [powerpc64le] seq_cst memory order possibly not honored
On 14.08.2015 13:19, Jonathan Wakely wrote: On 14 August 2015 at 10:54, Andrey Semashev wrote: Otherwise I cannot see how (x==0 && y==0) could happen. The last load in each thread is sequenced after the first seq_cst store and both stores are ordered with respect to each other, so one of the threads should produce 1. The tool evaluates the possible executions according to a formal model of the C++ memory model, so is invaluable for answering questions like this. It shows that there is no sychronizes-with (shown as "sw") relationship between the seq_cst store and the relaxed load for each atomic object. There is a total order of sequentially consistent operations, but the loads are not sequentially consistent and do not synchronize with the stores. Thank you Jonathan, you are correct. I've changed the test to use seq_cst on loads as well and also removed the first load as it doesn't really matter for the test. I'll see if it helps the tester. I'm still not entirely sure if the missing 'sync' instruction is a, let's say, desirable code, from practical point of view. I understand that the seq_cst load will generate an extra 'sync' which will ensure the stored 1 is visible to the other thread. However, if there is no second instruction, i.e. thread 1 performs a store(seq_cst) and thread 2 performs a load(seq_cst) of the same atomic variable, the second thread may not observe the stored value until thread 1 performs another instruction involving 'sync' (or the CPU flushes the cache line for some reason). This increases latencies of inter-thread communication.