Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT

Boqun Feng Thu, 19 Mar 2026 10:35:09 -0700

On Thu, Mar 19, 2026 at 05:59:40PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Thu, 19 Mar 2026 at 17:48, Boqun Feng <[email protected]> wrote:
> >
> > On Thu, Mar 19, 2026 at 05:33:50PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2026-03-19 09:27:59 [-0700], Boqun Feng wrote:
> > > > On Thu, Mar 19, 2026 at 10:03:15AM +0100, Sebastian Andrzej Siewior 
> > > > wrote:
> > > > > Please just use the queue_delayed_work() with a delay >0.
> > > > >
> > > >
> > > > That doesn't work since queue_delayed_work() with a positive delay will
> > > > still acquire timer base lock, and we can have BPF instrument with timer
> > > > base lock held i.e. calling call_srcu() with timer base lock.
> > > >
> > > > irq_work on the other hand doesn't use any locking.
> > >
> > > Could we please restrict BPF somehow so it does roam free? It is
> > > absolutely awful to have irq_work() in call_srcu() just because it
> > > might acquire locks.
> > >
> >
> > I agree it's not RCU's fault ;-)
> >
> > I guess it'll be difficult to restrict BPF, however maybe BPF can call
> > call_srcu() in irq_work instead? Or a more systematic defer mechanism
> > that allows BPF to defer any lock holding functions to a different
> > context. (We have a similar issue that BPF cannot call kfree_rcu() in
> > some cases IIRC).
> >
> > But we need to fix this in v7.0, so this short-term fix is still needed.
> >
> 
> I don't think this is an option, even longer term. We already do it
> when it's incorrect to invoke call_rcu() or any other API in a
> specific context (e.g., NMI, where we punt it using irq_work).
> However, the case reported in this thread is different. It was an
> existing user which worked fine before but got broken now. We were
> using call_rcu_tasks_trace() just fine in scx callbacks where rq->lock
> is held before, so the conversion underneath to call_srcu() should
> continue to remain transparent in this respect.
>


I'm not sure that's a real argument here, kernel doesn't have a stable
internal API, which allows developers to refactor the code into a saner
way. There are currently multiple issues that suggest we may need a
defer mechanism for BPF core, and if it makes the code more easier to
reason about then why not? Think about it like a process that we learn
about all the defer patterns that BPF currently needs and wrap them in a
nice and maintainable way.

Regards,
Boqun

> > Regars,
> > Boqun
> >
> > > > Regards,
> > > > Boqun
> > > >
> > > Sebastian

Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT

Reply via email to