On 2024-09-21 23:07, Lai Jiangshan wrote:
+/* + * hpref_hp_get: Obtain a reference to a stable object, protected either + * by hazard pointer (fast-path) or using reference + * counter as fall-back. + */ +static inline +bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx) +{ + int cpu = rseq_current_cpu_raw(); + struct hpref_percpu_slots *cpu_slots = rseq_percpu_ptr(hpref_percpu_slots, cpu); + struct hpref_slot *slot = &cpu_slots->slots[cpu_slots->current_slot]; + bool use_refcount = false; + struct hpref_node *node, *node2; + unsigned int next_slot; + +retry: + node = uatomic_load(node_p, CMM_RELAXED); + if (!node) + return false; + /* Use rseq to try setting current slot hp. Store B. */ + if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID, + (intptr_t *) &slot->node, (intptr_t) NULL, + (intptr_t) node, cpu)) { + slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];Can @cpu be possibly changed? if it can, it seems @cpu and @cpu_slots should be updated first.
Indeed, if migration happens between rseq_current_cpu_raw() and execution of rseq_load_cbne_store__ptr(), it will cause this second operation to fail. This in turn could cause the reader to retry for a long time (possibly forever) until it finally migrates back to the original CPU. As you suggest, we should re-load cpu and cpu_slots after failure here. More specifically, we should re-load those when the rseq c.s. fails with -1, which means it was abort or there was a cpu number mismatch. If the rseq c.s. returns 1, this means the slot did not contain NULL, so all we need to do is move over to the next slot. While applying your suggested change, I noticed that I can improve the fast-path by removing the notion of "current_slot" number, and thus remove increment of this hint from the fast path. The fast path will instead just scan all 8 slots trying to find a NULL one. This also lessens the odds that the fast-path must fallback to refcount in case of concurrent use. It provides a 50% performance improvement for the fast path with barrier/membarrier pairing. I've updated the https://github.com/compudj/userspace-rcu-dev/tree/hpref volatile feature branch with these changes. I'll give others some time to provide feedback on the overall idea before sending out an updated patch. Thanks for your feedback! Mathieu
+ use_refcount = true; + /* + * This may busy-wait for another reader using the + * emergency slot to transition to refcount. + */ + caa_cpu_relax(); + goto retry; + }
-- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com

