On Tue, 2018-02-27 at 13:16 +0000, Ruslan Nikolaev via gcc wrote: > > 3) Torvald pointed out further considerations such as users expecting > > lock-free atomic loads to be faster than stores. > > Is it even true? Is it faster to use some global lock (implemented through > RMW) than a single RMW operation? If you use this global lock, you will not > get loads faster than stores.
If GCC declares a type as lock-free, atomic loads on this type will be natively supported through some sort of load instruction. That means they are faster than stores under concurrent accesses, in particular when there are concurrent atomic loads (for all major HW we care about). If there is no natively supported atomic load, GCC will not declare the type to be lock-free. Nobody made statement about performance of locks vs. RMWs.