> On Jul 6, 2025, at 1:26 PM, Paul E. McKenney <[email protected]> wrote: > > On Sun, Jul 06, 2025 at 01:13:31PM -0400, Joel Fernandes wrote: >>> On 7/6/2025 1:08 PM, Paul E. McKenney wrote: >>> On Sat, Jul 05, 2025 at 04:39:15PM -0400, Joel Fernandes wrote: >>>> Signed-off-by: Joel Fernandes <[email protected]> >>> >>> Definitely headed in the right direction, though it does need just a >>> little bit more detail in the commit log. ;-) >>> >>> Also a few comments and questions interspersed below. >>> >>> Thanx, Paul >>> >>>> --- >>>> kernel/rcu/tree.h | 11 ++++++++++- >>>> kernel/rcu/tree_plugin.h | 29 ++++++++++++++++++++++------- >>>> 2 files changed, 32 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h >>>> index 3830c19cf2f6..f8f612269e6e 100644 >>>> --- a/kernel/rcu/tree.h >>>> +++ b/kernel/rcu/tree.h >>>> @@ -174,6 +174,15 @@ struct rcu_snap_record { >>>> unsigned long jiffies; /* Track jiffies value */ >>>> }; >>>> >>>> +/* >>>> + * The IRQ work (deferred_qs_iw) is used by RCU to get scheduler's >>>> attention. >>>> + * It can be in one of the following states: >>>> + * - DEFER_QS_IDLE: An IRQ work was never scheduled. >>>> + * - DEFER_QS_PENDING: An IRQ work was scheduler but never run. >>>> + */ >>>> +#define DEFER_QS_IDLE 0 >>>> +#define DEFER_QS_PENDING 1 >>> >>> Having names for the states is good! >>> >>>> + >>>> /* Per-CPU data for read-copy update. */ >>>> struct rcu_data { >>>> /* 1) quiescent-state and grace-period handling : */ >>>> @@ -192,7 +201,7 @@ struct rcu_data { >>>> /* during and after the last grace */ >>>> /* period it is aware of. */ >>>> struct irq_work defer_qs_iw; /* Obtain later scheduler attention. */ >>>> - bool defer_qs_iw_pending; /* Scheduler attention pending? */ >>>> + int defer_qs_iw_pending; /* Scheduler attention pending? */ >>>> struct work_struct strict_work; /* Schedule readers for strict GPs. >>>> */ >>>> >>>> /* 2) batch handling */ >>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h >>>> index dd1c156c1759..baf57745b42f 100644 >>>> --- a/kernel/rcu/tree_plugin.h >>>> +++ b/kernel/rcu/tree_plugin.h >>>> @@ -486,13 +486,16 @@ rcu_preempt_deferred_qs_irqrestore(struct >>>> task_struct *t, unsigned long flags) >>>> struct rcu_node *rnp; >>>> union rcu_special special; >>>> >>>> + rdp = this_cpu_ptr(&rcu_data); >>>> + if (rdp->defer_qs_iw_pending == DEFER_QS_PENDING) >>>> + rdp->defer_qs_iw_pending = DEFER_QS_IDLE; >>> >>> Good, this is where the request actually gets serviced. >>> >>>> + >>>> /* >>>> * If RCU core is waiting for this CPU to exit its critical section, >>>> * report the fact that it has exited. Because irqs are disabled, >>>> * t->rcu_read_unlock_special cannot change. >>>> */ >>>> special = t->rcu_read_unlock_special; >>>> - rdp = this_cpu_ptr(&rcu_data); >>>> if (!special.s && !rdp->cpu_no_qs.b.exp) { >>>> local_irq_restore(flags); >>>> return; >>>> @@ -623,12 +626,24 @@ notrace void rcu_preempt_deferred_qs(struct >>>> task_struct *t) >>>> */ >>>> static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp) >>>> { >>>> - unsigned long flags; >>>> - struct rcu_data *rdp; >>>> + volatile unsigned long flags; >>> >>> I don't understand why this wants to be volatile. >>> >>> Unless maybe you want to make sure that gdb can see it, in >>> which case, is there an existing Kconfig option for that? Maybe >>> CONFIG_DEBUG_INFO_NONE=n? >> >> This does not need to be volatile, sorry it was an older remnant (back when >> the >> handler was a NOOP in the v1, and I was afraid of compiler optimizations >> ;-)). >> But its no longer needed so I shall drop it (the volatile) :) > > Whew! ;-) > >>>> + struct rcu_data *rdp = this_cpu_ptr(&rcu_data); >>>> >>>> - rdp = container_of(iwp, struct rcu_data, defer_qs_iw); >>>> local_irq_save(flags); >>>> - rdp->defer_qs_iw_pending = false; >>>> + >>>> + /* >>>> + * Requeue the IRQ work on next unlock in following situation: >>>> + * 1. rcu_read_unlock() queues IRQ work (state -> DEFER_QS_PENDING) >>>> + * 2. CPU enters new rcu_read_lock() >>>> + * 3. IRQ work runs but cannot report QS due to rcu_preempt_depth() > >>>> 0 >>>> + * 4. rcu_read_unlock() does not re-queue work (state still PENDING) >>>> + * 5. Deferred QS reporting does not happen. >>>> + */ >>>> + if (rcu_preempt_depth() > 0) { >>>> + WRITE_ONCE(rdp->defer_qs_iw_pending, DEFER_QS_IDLE); >>> >>> Shouldn't we have just this WRITE_ONCE() in this then-clause? >> >> No, because if we let the IRQ work handler do that before we can execute >> rcu_preempt_deferred_qs_handler(), then it will cause infinite recursion, >> because an RCU read-side critical section can again try to queue the IRQ work >> (before entering the scheduler). Also testing shows doing that will reproduce >> the hang we're fixing. >> >> I think we should rename defer_qs_iw_pending to defer_qs_pending to better >> clarify that we are tracking the "Deferred QS" reporting than if the IRQ work >> actually ran? > > Here is the patch: > > + if (rcu_preempt_depth() > 0) { > + WRITE_ONCE(rdp->defer_qs_iw_pending, DEFER_QS_IDLE); > + local_irq_restore(flags); > + return; > + } > local_irq_restore(flags); > } > > After the WRITE_ONCE, you restore interrupts and return. Which is also > what would happen if there was only the WRITE_ONCE() in the then-clause, > correct? > > Or am I missing something subtle here?
Oh, true! I will delete those 2 lines inside the if block. Thanks! - Joel > > Thanx, Paul

