On 2026/6/16 14:56, David Hildenbrand (Arm) wrote: >>> >>> >>> Assume that we enlighten all non-atomics to grab the rcu read lock, such as >> >> These non-atomics are defined and used because they want to avoid atomic ops >> overhead? >> So I'm afraid using rcu read lock in these places would lead to unexpected >> overhead. > > It should be cheaper than atomics IIUC. Further, I assume that some pages > could > batch over multiple such operations (esp. page freeing path when we process > tail > pages). > > With !CONFIG_PREEMPT_RCU it's simply preempt_disable()/preempt_enable(), which > is either a NOP or just adjusting the preempt counter of the current thread. > Cheap. > > With CONFIG_PREEMPT_RCU we mostly increment current->rcu_read_lock_nesting. > But > there might be a function call involved (did not look into the details). So > that > variant should be slightly more expensive.
I scanned the code and found rcu_read_unlock_special might be called in some cases. Some expensive ops, e.g. irq_work_queue_on, might be called in some corner cases. So the overhead of rcu read lock might be fluctuating. > > We'd have to measure what an addition rcu read lock would cost in there. that > should be fairly easy to benchmark. Sure. We can do that if needed. > >>> >>> Maybe that would work. There would still be issues to solve >>> >>> (a) We don't hold the mf_mutex on all call paths, but we really need it so a >>> page_test_set_hwpoison() cannot race in weird ways with the other >>> primitives I think. >>> >>> (b) There are some leftover SetPageHWPoison etc. instances. The ones in >>> arch/x86/kernel/cpu/mce/core.c likely cannot grab the mutex, but maybe they >>> are >>> corner cases either way and we can document the situation. >>> >>> >>> Further, while I assume the synchronize_rcu() on the MCE path should be fine >>> (who cares about performance there?), I don't know if the added RCU read >>> lock >>> on some paths could be noticable. >>> >>> So one idea worth discussing, but I am sure there are more problems. >> >> I think this is a good idea, although there are some remaining issues. >> But such race should be really rare, is it worth all this effort? Could we >> simply aim to resolve, not to be flawless? I.e. could we simply check >> and re-set the hwpoison flag at the end of memory_failure handling to >> simply avoid losing hwpoison flag as a best-effort attempt? Would it be >> acceptable? > > Hacky. Sufficient for the hypervisor to suspend the nonatomic-setting CPU at > the > wrong time to still trigger the same behavior. Right. hypervisor could make the issue easier to trigger... > > I think, either we fix it properly, or we redesign hwpoison handling to deal > with setting/clearing becoming stale at some random point in the future. I think your proposal, although there are still some issues to be resolved, is nevertheless a good solution. We could also wait and see if anyone comes up with a better one. Thanks. .

