On Wed, Oct 29, 2025 at 8:38 PM <[email protected]> wrote: > > > ``` > commit 8ae9fd98fe53c45a1a89adf6d254c2db90d26544 > Author: Menglong Dong <[email protected]> > > bpf: use rqspinlock for lru map > > This patch converts raw_spinlock_t to rqspinlock_t in BPF LRU maps to > prevent NMI context deadlocks. The new lock API can fail gracefully > instead of deadlocking, returning an error code. > > Link: > https://lore.kernel.org/bpf/CAEf4BzbTJCUx0D=zjx6+5m5iighwlzap94hnw36zmdhaf4-...@mail.gmail.com/ > > > diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c > > index e7a2fc605..38fddcb1e 100644 > > --- a/kernel/bpf/bpf_lru_list.c > > +++ b/kernel/bpf/bpf_lru_list.c > > [ ... ] > > > @@ -466,23 +470,26 @@ static struct bpf_lru_node > > *bpf_common_lru_pop_free(struct bpf_lru *lru, > > do { > > steal_loc_l = per_cpu_ptr(clru->local_list, steal); > > > > - raw_spin_lock_irqsave(&steal_loc_l->lock, flags); > > + if (raw_res_spin_lock_irqsave(&steal_loc_l->lock, flags)) > > + goto out_next; > > > > node = __local_list_pop_free(steal_loc_l); > > if (!node) > > node = __local_list_pop_pending(lru, steal_loc_l); > > > > - raw_spin_unlock_irqrestore(&steal_loc_l->lock, flags); > > + raw_res_spin_unlock_irqrestore(&steal_loc_l->lock, flags); > > > > +out_next: > > steal = cpumask_next_wrap(steal, cpu_possible_mask); > > } while (!node && steal != first_steal); > > > > loc_l->next_steal = steal; > > > > if (node) { > > - raw_spin_lock_irqsave(&loc_l->lock, flags); > > + if (raw_res_spin_lock_irqsave(&loc_l->lock, flags)) > > + return NULL; > ^^^^^^^^^^^ > > Can this leak the node? At this point in bpf_common_lru_pop_free(), > a node was successfully stolen from another CPU's list. The > __local_list_pop_free() or __local_list_pop_pending() call removed > the node from steal_loc_l, and steal_loc_l was unlocked. The node > now exists but is not in any list. > > When raw_res_spin_lock_irqsave() fails here, the function returns > NULL, but the stolen node is never added to the local pending list > and never returned to any list. The node becomes orphaned.
AI is right. Here and in other places you can just leak the objects. res_spin_lock() is not a drop-in replacement. The whole thing needs to be thought through.

