Hi, >> Is there no benefit to using SWPPL for RELEASE here? Similarly for the >> others. > > We started off implementing all possible memory orderings available. > Wilco saw value in merging less restricted orderings into more > restricted ones - mainly to reduce codesize in less frequently used atomics. > > This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions > a little smaller.
Benchmarking showed that LSE and LSE2 RMW atomics have similar performance once the atomic is acquire, release or both. Given there is already a significant overhead due to the function call, PLT indirection and argument setup, it doesn't make sense to add extra taken branches that may mispredict or cause extra fetch cycles... The goal for next GCC is to inline these instructions directly to avoid these overheads. Cheers, Wilco