On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:

> > @item -mrelax-cmpxchg-loop
> > @opindex mrelax-cmpxchg-loop
> >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> >-execute pause if load value is not expected. This reduces excessive
> >-cachline bouncing when and works for all atomic logic fetch builtins
> >-that generates compare and swap loop.
> >+For compare and swap loops that emitted by some __atomic_* builtins
> 
> s/that emitted/that are emitted/
> 
> >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> >+counterparts), emit an atomic load before cmpxchg instruction. If the
> 
> s/before cmpxchg/before the cmpxchg/
> 
> >+loaded value is not equal to expected, execute a pause instead of
> 
> s/not equal to expected/not equal to the expected/
> 
> >+directly run the cmpxchg instruction. This might reduce excessive
> 
> s/directly run/directly running/

This results in "... execute a pause instead of directly running the
cmpxchg instruction", which needs further TLC because:

* 'a pause' should be 'the PAUSE instruction';
* 'directly running [an instruction]' does not seem correct in context.

The option also applies to __sync builtins, not just __atomic.


How about the following:

When emitting a compare-and-swap loop for @ref{__sync Builtins}
and @ref{__atomic Builtins} lacking a native instruction, optimize
for the highly contended case by issuing an atomic load before the
@code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
when restarting the loop.

Alexander

Reply via email to