Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

Wilco Dijkstra Mon, 08 Jan 2024 05:06:32 -0800

Hi,

>> Is there no benefit to using SWPPL for RELEASE here?  Similarly for the
>> others.
>
> We started off implementing all possible memory orderings available. 
> Wilco saw value in merging less restricted orderings into more 
> restricted ones - mainly to reduce codesize in less frequently used atomics.
> 
> This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions 
> a little smaller.


Benchmarking showed that LSE and LSE2 RMW atomics have similar performance once
the atomic is acquire, release or both. Given there is already a significant 
overhead due
to the function call, PLT indirection and argument setup, it doesn't make sense 
to add
extra taken branches that may mispredict or cause extra fetch cycles...

The goal for next GCC is to inline these instructions directly to avoid these 
overheads.

Cheers,
Wilco

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

Reply via email to