On Sep 12, 2011, at 03:02, Paolo Bonzini wrote:

> On 09/11/2011 09:00 PM, Geert Bosch wrote:
>> So, if I understand correctly, then operations using relaxed memory
>> order will still need fences, but indeed do not require any
>> optimization barrier. For memory_order_seq_cst we'll need a full
>> barrier, and for the others there is a partial barrier.
> 
> If you do not need an optimization barrier, you do not need a processor 
> barrier either, and vice versa.  Optimizations are just another factor that 
> can lead to reordered loads and stores.

Assuming that statement is true, that would imply that even for relaxed 
ordering there has to be an optimization barrier. Clearly fences need to be 
used for any atomic accesses, including those with relaxed memory order.

Consider 4 threads and an atomic int x:

thread 1  thread 2  thread 3  thread 4
--------  --------  --------  --------
  x=1;      r1=x      x=3;      r3=x;
  x=2;      r2=x      x=4;      r4=x;

Even with relaxed memory ordering, all modifications to x have to occur in some 
particular total order, called  the modification order of x.

So, even if each thread preserves its store order, the modification order of x 
can be any of:
  1,2,3,4
  1,3,2,4
  1,3,4,2
  3,1,2,4
  3,1,4,2
  3,4,1,2

Because there is a single modification order for x, it would be an error for 
thread 2 and thread 4 to see a different update order.

So, if r1==2,r2==3 and r3==4,r4==1, that would be an error. However, without 
fences, this can easily happen on an SMP machine, even one with a nice memory 
model such as the x86.

IIUC, the relaxed memory model mostly seems to allow movement (by compiler and 
CPU) of unrelated memory operations, but still requires fences between 
subsequent atomic operations on the same object. 

In other words, while atomic operations with relaxed memory order on some 
atomic object X cannot be used to synchronize any operations on objects other 
than X, they themselves cannot cause data races.

  -Geert

Reply via email to