Re: should sync builtins be full optimization barriers?

Paolo Bonzini Mon, 12 Sep 2011 23:35:52 -0700

On Tue, Sep 13, 2011 at 03:52, Geert Bosch <bo...@adacore.com> wrote:
> No, it is possible, and actually likely. Basically, the issue is write 
> buffers.
> The coherency mechanisms come into play at a lower level in the
> hierarchy (typically at the last-level cache), which is why we need fences
> to start with to implement things like spin locks.


You need fences on x86 to implement Petterson or Dekkar spin locks but
only because they involve write-read ordering to different memory
locations (I'm mentioning those spin lock algorithms because they do
not require locked memory accesses).  Write-write, read-read and for
the same location write-read ordering are guaranteed by the processor.
 Same for coherency which is a looser property.

However, accesses in the those spin lock algorithm are definitely
_not_ relaxed; not all of them, at least.

> No that's false. Even on systems with nice memory models, such as x86
> and SPARC with a TSO model, you need a fence to avoid that a write-load
> of the same location is forced to make it all the way to coherent memory
> and not forwarded directly from the write buffer or L1 cache.

Not sure about SPARC, but this is definitely false on x86.

Granted, even if you do not have to put fences those writes are likely
_not_ free.  The processor needs to do more than say on PPC, so I
wouldn't be surprised if conflicting memory accesses are quite more
expensive on x86 than PPC.  Recently, a colleague of mine tried
replacing optimization barriers with full barriers in one of two
threads implementing a ring buffer; that thread was now 30% slower,
but the other thread sped up by basically the same time.

Paolo

Re: should sync builtins be full optimization barriers?

Reply via email to