Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT

Boqun Feng Thu, 19 Mar 2026 13:21:05 -0700

On Thu, Mar 19, 2026 at 02:42:56PM -0400, Joel Fernandes wrote:
> On 3/19/2026 1:44 PM, Boqun Feng wrote:
> > On Thu, Mar 19, 2026 at 06:02:44PM +0100, Sebastian Andrzej Siewior wrote:
> >> On 2026-03-19 09:48:16 [-0700], Boqun Feng wrote:
> >>> I agree it's not RCU's fault ;-)
> >>
> >> I never claimed it is anyone's fault. I just see that BPF should be able
> >> to do things which kgdb would not be allowed to.
> >>
> >>> I guess it'll be difficult to restrict BPF, however maybe BPF can call
> >>> call_srcu() in irq_work instead? Or a more systematic defer mechanism
> >>> that allows BPF to defer any lock holding functions to a different
> >>> context. (We have a similar issue that BPF cannot call kfree_rcu() in
> >>> some cases IIRC).
> >>>
> >>> But we need to fix this in v7.0, so this short-term fix is still needed.
> >>
> >> I would prefer something substantial before we rush to get a quick fix
> >> and move on.
> >>
> > 
> > The quick fix here is really "restore the previous behavior of
> > call_rcu_tasks_trace() in call_srcu()", and the future work will
> 
> Unfortunately reverting c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in
> terms of SRCU-fast") is tricky since the original body of RCU Tasks Trace code
> is deleted. Perhaps we should have added an easier escape-hatch, lesson 
> learnt:)
> 
> > naturally happen: if the extra irq_work layer turns out calling issues
> > to other SRCU users, then we need to fix them as well. Otherwise, there
> > is no real need to avoid the extra irq_work hop. So I *think* it's OK
> > ;-)
> > 
> > Cleaning up all the ad-hoc irq_work usages in BPF is another thing,
> > which can happen if we learn about all the cases and have a good design.
> > 
> >> If we could get that irq_work() part only for BPF where it is required
> >> then it would be already a step forward.
> >>
> > 
> > I'm happy to include that (i.e. using Qiang's suggestion) if Joel also
> > agrees.
> 
> Sure, I am Ok with sort of short-term fix, but I worry that it still does not
> the issues due to the tasks-trace conversion. In particular, it doesn't fix 
> the
> issue Andrea reported AFAICS, because there is a dependency on pool->lock? 
> see:
> https://lore.kernel.org/all/abjzvz_tL_siV17s@gpd4/
> 
> That happens precisely because of the queue_delayed_work() happening from the
> SRCU tasks-trace specific BPF right?
> 
> This looks something like this, due to combination of SRCU, scheduler and WQ:
> 
> srcu_usage.lock -> pool->lock -> pi_lock -> rq->__lock
>        ^                                       |
>        |                                       |
>        +----------- DEADLOCK CYCLE ------------+
> 
> >> Long term it would be nice if we could avoid calling this while locks
> >> are held. I think call_rcu() can't be used under rq/pi lock, but timers
> >> should be fine.
> >>
> >> Is this rq/pi locking originating from "regular" BPF code or sched_ext?
> >>
> > 
> > I think if you have any tracepoint (include traceable functions) under
> > rq/pi locking, then potentially BPF can call call_srcu() there.
> 
> > 
> > The root cause of the issues is that BPF is actually like a NMI unless
> > the code is noinstr (There is a rabit hole about BPF calling
> > call_srcu() while it's instrumenting call_srcu() itself). And the right
> > way to solve all the issues is to have a general defer mechanism for
> > BPF.
> Will that really solve the above mentioned issue though that Andrea reported?
>


It should, since we call irq_work to queue_work instead queue_work
directly, so we break the srcu_usage.lock -> pool->lock dependency. But
yes, some tests would be good, the code is at:

        https://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git/ 
srcu-fix

related commits are:

78dcdc35d85f rcu: Use an intermediate irq_work to start process_srcu()
0490fe4b5c39 srcu: Use raw spinlocks so call_srcu() can be used under 
preempt_disable()

One fixes the raw spinlock vs spinlock issue, the other fixes the
deadlock.

Regards,
Boqun

> +Andrea, +Steve as well.
> 
> thanks,
> 
> --
> Joel Fernandes
>

Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT

Reply via email to