[powerpc64le] seq_cst memory order possibly not honored

2015-08-13 Thread Andrey Semashev

Hi,

I'm having a problem with one of the Boost.Atomic tests on a PowerPC64 
LE test platform. The test is running two threads which are looping code 
like this:


  Thread 1   Thread 2
 [initially a == 0 && b == 0]
  a.store(1, seq_cst);   b.store(1, seq_cst);
  a.load(relaxed);   b.load(relaxed);
  x = b.load(relaxed);   y = a.load(relaxed);

On each iteration the test verifies that !(x == 0 && y == 0) and this 
check fails. As far as I can tell the test is valid and it indeed passes 
on x86. Boost.Atomic uses __atomic* intrinsics in both cases, so from 
C++ perspective the implementation is the same.


I don't have the access to the tester that fails, so I can't tell the 
exact GCC version that is used there, only that it is labeled as 6.0. I 
did some experimenting on the version 4.9.2 that I have locally and 
found out that for code like this:


  __atomic_store_n(&n, 1, __ATOMIC_SEQ_CST);

gcc generates this assembly:

  1c:   01 00 40 39 li  r10,1
  20:   ac 04 00 7c sync
  24:   00 00 49 91 stw r10,0(r9)

I would expect that for seq_cst there should be a second 'sync' right 
after the store, but it is absent. If 6.0 generates the similar code 
then this might explain the test failures I'm seeing.


So my questions are:

1. Is my test valid or is there a flaw that I'm missing?
2. Am I correct about the missing 'sync' instruction?

Thanks.

PS: If you're interested, here's the full code snippet I compiled:

  int n = 0;

  int main()
  {
__atomic_store_n(&n, 1, __ATOMIC_SEQ_CST);
return n;
  }

The command line was:

  powerpc64le-linux-gnu-g++ -g -O0 -o seq_cst_ppc64el.o -c ./seq_cst.cpp

And here's the link to the original Boost.Atomic test:

https://github.com/boostorg/atomic/blob/develop/test/ordering.cpp


Re: [powerpc64le] seq_cst memory order possibly not honored

2015-08-14 Thread Andrey Semashev

On 14.08.2015 11:51, Jonathan Wakely wrote:

On 14 August 2015 at 01:37, Andrey Semashev wrote:

1. Is my test valid or is there a flaw that I'm missing?


The cppmem tool at http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/ shows
that there are consistent executions where (x==0 && y==0) is true. I
used this code:

int main() {
   atomic_int a = 0;
   atomic_int b = 0;
   int x, y;
   {{{
 {
   a.store(1,mo_seq_cst);
   a.load(mo_relaxed);
   x = b.load(mo_relaxed);
 }
   |||
 {
   b.store(1, mo_seq_cst);
   b.load(mo_relaxed);
   y = a.load(mo_relaxed);
 }
   }}}
   return 0;
}


I'm not familiar with that tool and not sure how to interpret its 
output, but I think it misses one point that is present in the original 
test and I failed to mention. The Boost.Atomic test uses Boost.Thread 
barriers to ensure that both threads have executed the store-load-load 
region before checking for x and y.


Otherwise I cannot see how (x==0 && y==0) could happen. The last load in 
each thread is sequenced after the first seq_cst store and both stores 
are ordered with respect to each other, so one of the threads should 
produce 1.




Re: [powerpc64le] seq_cst memory order possibly not honored

2015-08-14 Thread Andrey Semashev

On 14.08.2015 13:19, Jonathan Wakely wrote:

On 14 August 2015 at 10:54, Andrey Semashev  wrote:


Otherwise I cannot see how (x==0 && y==0) could happen. The last load in
each thread is sequenced after the first seq_cst store and both stores are
ordered with respect to each other, so one of the threads should produce 1.


The tool evaluates the possible executions according to a formal model
of the C++ memory model, so is invaluable for answering questions like
this.

It shows that there is no sychronizes-with (shown as "sw")
relationship between the seq_cst store and the relaxed load for each
atomic object. There is a total order of sequentially consistent
operations, but the loads are not sequentially consistent and do not
synchronize with the stores.


Thank you Jonathan, you are correct. I've changed the test to use 
seq_cst on loads as well and also removed the first load as it doesn't 
really matter for the test. I'll see if it helps the tester.


I'm still not entirely sure if the missing 'sync' instruction is a, 
let's say, desirable code, from practical point of view. I understand 
that the seq_cst load will generate an extra 'sync' which will ensure 
the stored 1 is visible to the other thread. However, if there is no 
second instruction, i.e. thread 1 performs a store(seq_cst) and thread 2 
performs a load(seq_cst) of the same atomic variable, the second thread 
may not observe the stored value until thread 1 performs another 
instruction involving 'sync' (or the CPU flushes the cache line for some 
reason). This increases latencies of inter-thread communication.