On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:
> And that would be fine, IMO. If you can't even load atomically, doing
> something useful with this type will be hard except in special cases.
> Also, doing a CAS (compare-and-swap) and thus potentially bringing in
> the cache line in exclusive mode can be a lot more costly than what
> users might expect from a load. A short critical section might not be
> much slower.
>
> If you only have a CAS as base of the atomic operations on a type, then
> a CAS operation exposed to the user will still be a just a single HW
> CAS. But any other operation besides the CAS and a load will need *two*
> CAS operations; even an atomic store has to be implemented as a CAS
> loop.
Would we just stop expanding all those __atomic_*/__sync_* builtins inline
then (which would IMHO break tons of stuff), or just some predicate that
atomic.h/atomic headers use?
> > But doesn't that mean you should fall back to locked operation also for any
> > other atomic operation on such types, because otherwise if you atomic_store
> > or any other kind of atomic operation, it wouldn't use the locking, while
> > for atomic load it would?
>
> I suppose you mean that one must fall back to using locking for all
> operations? If load isn't atomic, then it can't be made atomic using
> external locks if the other operations don't use the locks.
>
> > That would be an ABI change and quite significant
> > pessimization in many cases.
>
> A change from wide CAS to locking would be an ABI change I suppose, but
> it could also be considered a necessary bugfix if we don't want to write
> to read-only memory. Does this affect anything but i686?
Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
(judging from who initiated this thread), I bet there are many others.
Jakub