On Mon, 25 May 2026 07:41:13 +0000
Konstantin Ananyev <[email protected]> wrote:

> Hi Stephen,
> 
> > The rte_atomic32_cmpset is deprecated. Initial attempts at
> > changing this with direct conversion to
> > rte_atomic_compare_exchange_weak_explicit()
> > regressed MP/MC contended performance on x86 by 10-30%,
> > because the C11 builtin's failure-writeback semantic forces
> > GCC to emit extra instructions on the CAS critical path.
> > 
> > Add an internal __rte_ring_compare_and_swap() wrapper that calls
> > __sync_bool_compare_and_swap() directly, which keeps the original
> > instruction sequence. Add equivalent function for MSVC.  
> 
> In fact, in rte_ring we do have 2 implementations of the same core functions:
> lib/ring/rte_ring_c11_pvt.h  - uses C11 atomics
> lib/ring/rte_ring_generic_pvt.h - uses legacy instructions (smp_mb, extra), 
> If we going remove these legacy instructions anyway (or reimplementing them 
> using C11 atomics),
> then there is probably no point to keep rte_ring_generic_pvt.h.
> Konstantin

Have been deep diving into why C11 atomics give 20-30% performance
drop versus atomic32 version. So far it comes down to GCC optimizer
not doing as well with C11 versus assembly. The C11 form with the
excessive use of always_inline consumes more registers.

Reply via email to