To avoid information loss I combined reply to two Wathsala replies into one.
> > > The function __rte_ring_headtail_move_head() assumes that the > > > barrier > > (fence) between the load of the head and the load-acquire of the > > > opposing tail guarantees the following: if a first thread reads > > > tail > > > and then writes head and a second thread reads the new value of > > > head > > > and then reads tail, then it should observe the same (or a later) > > > value of tail. > > > > > > This assumption is incorrect under the C11 memory model. If the > > > barrier > > > (fence) is intended to establish a total ordering of ring > > > operations, > > > it fails to do so. Instead, the current implementation only > > > enforces a > > > partial ordering, which can lead to unsafe interleavings. In > > > particular, > > > some partial orders can cause underflows in free slot or available > > > element computations, potentially resulting in data corruption. > > > > Hmm... sounds exactly like the problem from the patch we discussed > > earlier that year: > > https://patchwork.dpdk.org/project/dpdk/patch/20250521111432.207936-4-konstantin.anan...@huawei.com/ > > In two words: > > "... thread can see 'latest' 'cons.head' value, with 'previous' value > > for 'prod.tail' or visa-versa. > > In other words: 'cons.head' value depends on 'prod.tail', so before > > making latest 'cons.head' > > value visible to other threads, we need to ensure that latest > > 'prod.tail' is also visible." > > Is that the one? > Yes, the behavior occurs under RCpc (LDAPR) but not under RCsc (LDAR), > which is why we didn’t catch it earlier. A fuller explanation, with > Herd7 simulations, is in the blog post linked in the cover letter. > > https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/when-a-barrier-does-not-block-the-pitfalls-of-partial-order I see, so now it is reproducible with core rte_ring on real HW. > > > > > The issue manifests when a CPU first acts as a producer and later > > > as a > > > consumer. In this scenario, the barrier assumption may fail when > > > another > > > core takes the consumer role. A Herd7 litmus test in C11 can > > > demonstrate > > > this violation. The problem has not been widely observed so far > > > because: > > > (a) on strong memory models (e.g., x86-64) the assumption holds, > > > and > > > (b) on relaxed models with RCsc semantics the ordering is still > > > strong > > > enough to prevent hazards. > > > The problem becomes visible only on weaker models, when load- > > > acquire is > > > implemented with RCpc semantics (e.g. some AArch64 CPUs which > > > support > > > the LDAPR and LDAPUR instructions). > > > > > > Three possible solutions exist: > > > 1. Strengthen ordering by upgrading release/acquire semantics to > > > sequential consistency. This requires using seq-cst for > > > stores, > > > loads, and CAS operations. However, this approach introduces a > > > significant performance penalty on relaxed-memory > > > architectures. > > > > > > 2. Establish a safe partial order by enforcing a pair-wise > > > happens-before relationship between thread of same role by > > > changing > > > the CAS and the preceding load of the head by converting them > > > to > > > release and acquire respectively. This approach makes the > > > original > > > barrier assumption unnecessary and allows its removal. > > > > For the sake of clarity, can you outline what would be exact code > > changes for > > approach #2? Same as in that patch: > > https://patchwork.dpdk.org/project/dpdk/patch/20250521111432.207936-4- > konstantin.anan...@huawei.com/ > > Or something different? > > Sorry, I missed the later half you your comment before. > Yes, you have proposed the same solution there. Ok, thanks for confirmation. > > > > > > > 3. Retain partial ordering but ensure only safe partial orders > > > are > > > committed. This can be done by detecting underflow conditions > > > (producer < consumer) and quashing the update in such cases. > > > This approach makes the original barrier assumption > > > unnecessary > > > and allows its removal. > > > > > This patch implements solution (3) for performance reasons. > > > > > > Signed-off-by: Wathsala Vithanage <wathsala.vithan...@arm.com> > > > Signed-off-by: Ola Liljedahl <ola.liljed...@arm.com> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > Reviewed-by: Dhruv Tripathi <dhruv.tripa...@arm.com> > > > --- > > > lib/ring/rte_ring_c11_pvt.h | 10 +++++++--- > > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > > > diff --git a/lib/ring/rte_ring_c11_pvt.h > > > b/lib/ring/rte_ring_c11_pvt.h > > > index b9388af0da..e5ac1f6b9e 100644 > > > --- a/lib/ring/rte_ring_c11_pvt.h > > > +++ b/lib/ring/rte_ring_c11_pvt.h > > > @@ -83,9 +83,6 @@ __rte_ring_headtail_move_head(struct > > > rte_ring_headtail > > > *d, > > > /* Reset n to the initial burst count */ > > > n = max; > > > > > > - /* Ensure the head is read before tail */ > > > - rte_atomic_thread_fence(rte_memory_order_acquire); > > > - > > > /* load-acquire synchronize with store-release of > > > ht->tail > > > * in update_tail. > > > */ > > > > But then cons.head can be read a before prod.tail (and visa-versa), > > right? > > Right, we let it happen but eliminate any resulting states that are > semantically incorrect at the end. Two comments here: 1) I think it is probably safer to do the check like that: If (*entries > ring->capacity) ... 2) My concern that without forcing a proper read ordering (cons.head first then prod.tail) we re-introduce a window for all sorts of ABA-like problems. ring: guarantee load/load order in enqueue and dequeue commit 9bc2cbb007c0a3335c5582357ae9f6d37ea0b654 Author: Jia He <justin...@arm.com> Date: Fri Nov 10 03:30:42 2017 +0000 > > > > > @@ -99,6 +96,13 @@ __rte_ring_headtail_move_head(struct > > > rte_ring_headtail > > > *d, > > > */ > > > *entries = (capacity + stail - *old_head); > > > > > > + /* > > > + * Ensure the entries calculation was not based on > > > a stale > > > + * and unsafe stail observation that causes > > > underflow. > > > + */ > > > + if ((int)*entries < 0) > > > + *entries = 0; > > > + > > > /* check that we have enough room in ring */ > > > if (unlikely(n > *entries)) > > > n = (behavior == RTE_RING_QUEUE_FIXED) ? > > > -- > > > 2.43.0 > > > > > > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please > notify the sender immediately and do not disclose the contents to any other > person, > use it for any purpose, or store or copy the information in any medium. Thank > you.