> From: Stephen Hemminger [mailto:[email protected]] > Sent: Monday, 25 May 2026 17.35 > > On Mon, 25 May 2026 07:41:13 +0000 > Konstantin Ananyev <[email protected]> wrote: > > > Hi Stephen, > > > > > The rte_atomic32_cmpset is deprecated. Initial attempts at > > > changing this with direct conversion to > > > rte_atomic_compare_exchange_weak_explicit() > > > regressed MP/MC contended performance on x86 by 10-30%, > > > because the C11 builtin's failure-writeback semantic forces > > > GCC to emit extra instructions on the CAS critical path. > > > > > > Add an internal __rte_ring_compare_and_swap() wrapper that calls > > > __sync_bool_compare_and_swap() directly, which keeps the original > > > instruction sequence. Add equivalent function for MSVC. > > > > In fact, in rte_ring we do have 2 implementations of the same core > functions: > > lib/ring/rte_ring_c11_pvt.h - uses C11 atomics > > lib/ring/rte_ring_generic_pvt.h - uses legacy instructions (smp_mb, > extra), > > If we going remove these legacy instructions anyway (or > reimplementing them using C11 atomics), > > then there is probably no point to keep rte_ring_generic_pvt.h. > > Konstantin > > Have been deep diving into why C11 atomics give 20-30% performance > drop versus atomic32 version. So far it comes down to GCC optimizer > not doing as well with C11 versus assembly. The C11 form with the > excessive use of always_inline consumes more registers.
Just an idea: Perhaps adding "const" and/or "restrict" to relevant parameters will give the optimizer the information it needs?

