On Fri, 2026-05-15 at 12:34 +0800, Kaitao Cheng wrote: > > 在 2026/5/14 09:50, Alexei Starovoitov 写道: > > On Wed May 13, 2026 at 3:53 PM PDT, Eduard Zingerman wrote: > > > On Tue, 2026-05-12 at 06:41 +0000, [email protected] wrote: > > > > > > [...] > > > > > > > When a BPF program holds an owning or refcount-acquired reference to > > > > one of these nodes (node X), which is structurally supported because > > > > __bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at > > > > refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a > > > > corruption: > > > > > > > > CPU 0 (bpf_list_head_free, lock released) CPU 1 (BPF prog, refcount X) > > > > ----------------------------------------- ---------------------------- > > > > (owner of X == NULL, X linked in drain) > > > > bpf_list_push_back(other, X) > > > > __bpf_list_add: > > > > spin_lock() > > > > cmpxchg(X->owner, NULL, > > > > POISON) -> OK > > > > > > > > list_add_tail(&X->list_head, > > > > other_head) > > > > -> overwrites X->next, > > > > X->prev, corrupts > > > > other_head's chain > > > > because X is still > > > > stitched into drain > > > > pos = drain.next; (may be X or neighbor using X's stale next) > > > > list_del_init(pos); reads X->next/prev now pointing into other_head, > > > > corrupts other_head's list and/or drain > > > > > > > > > Kaitao, this scenario seem plausible, could you please comment on it? > > > > I think bot is correct. > > This patch looks buggy. > > It seems to me an optimization that breaks the concurrent logic. > > May be just drop this patch and reorder the other one, so that bot > > sees nonown suffix logic first. > > This patch is still necessary because it addresses the problem discussed > in this thread: > https://lore.kernel.org/all/[email protected]/ > > The patch does have a bug, however. To fix the issues we are seeing now, > I propose the additional changes below and would appreciate feedback. > > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct btf_field *field, > void *list_head, > if (!head->next || list_empty(head)) > goto unlock; > list_for_each_safe(pos, n, head) { > - WRITE_ONCE(container_of(pos, > - struct bpf_list_node_kern, list_head)->owner, NULL); > + struct bpf_list_node_kern *node; > + > + node = container_of(pos, struct bpf_list_node_kern, > list_head); > + WRITE_ONCE(node->owner, BPF_PTR_POISON); > list_move_tail(pos, &drain); > } > unlock: > @@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct btf_field *field, > void *list_head, > __bpf_spin_unlock_irqrestore(spin_lock); > > while (!list_empty(&drain)) { > + struct bpf_list_node_kern *node; > + > pos = drain.next; > + node = container_of(pos, struct bpf_list_node_kern, > list_head); > list_del_init(pos); > + WRITE_ONCE(node->owner, NULL);
I think this still leaves a short race window open. Why does the .owner has field to be NULL? Can the logic that implies for it to be NULL be extended to accept POISON as well? > /* The contained type can also have resources, including a > * bpf_list_head which needs to be freed. > */ [...]

