On 2026/6/16 14:56, David Hildenbrand (Arm) wrote:
>>>
>>>
>>> Assume that we enlighten all non-atomics to grab the rcu read lock, such as
>>
>> These non-atomics are defined and used because they want to avoid atomic ops 
>> overhead?
>> So I'm afraid using rcu read lock in these places would lead to unexpected 
>> overhead.
> 
> It should be cheaper than atomics IIUC. Further, I assume that some pages 
> could
> batch over multiple such operations (esp. page freeing path when we process 
> tail
> pages).
> 
> With !CONFIG_PREEMPT_RCU it's simply preempt_disable()/preempt_enable(), which
> is either a NOP or just adjusting the preempt counter of the current thread. 
> Cheap.
> 
> With CONFIG_PREEMPT_RCU we mostly increment current->rcu_read_lock_nesting. 
> But
> there might be a function call involved (did not look into the details). So 
> that
> variant should be slightly more expensive.

I scanned the code and found rcu_read_unlock_special might be called in some 
cases.
Some expensive ops, e.g. irq_work_queue_on, might be called in some corner 
cases.
So the overhead of rcu read lock might be fluctuating.

> 
> We'd have to measure what an addition rcu read lock would cost in there. that
> should be fairly easy to benchmark.

Sure. We can do that if needed.

> 
>>>
>>> Maybe that would work. There would still be issues to solve
>>>
>>> (a) We don't hold the mf_mutex on all call paths, but we really need it so a
>>> page_test_set_hwpoison() cannot race in weird ways with the other 
>>> primitives I think.
>>>
>>> (b) There are some leftover SetPageHWPoison etc. instances. The ones in
>>> arch/x86/kernel/cpu/mce/core.c likely cannot grab the mutex, but maybe they 
>>> are
>>> corner cases either way and we can document the situation.
>>>
>>>
>>> Further, while I assume the synchronize_rcu() on the MCE path should be fine
>>> (who cares about performance there?), I don't know if the added RCU read 
>>> lock
>>> on some paths could be noticable.
>>>
>>> So one idea worth discussing, but I am sure there are more problems.
>>
>> I think this is a good idea, although there are some remaining issues.
>> But such race should be really rare, is it worth all this effort? Could we
>> simply aim to resolve, not to be flawless? I.e. could we simply check
>> and re-set the hwpoison flag at the end of memory_failure handling to
>> simply avoid losing hwpoison flag as a best-effort attempt? Would it be
>> acceptable?
> 
> Hacky. Sufficient for the hypervisor to suspend the nonatomic-setting CPU at 
> the
> wrong time to still trigger the same behavior.

Right. hypervisor could make the issue easier to trigger...

> 
> I think, either we fix it properly, or we redesign hwpoison handling to deal
> with setting/clearing becoming stale at some random point in the future.

I think your proposal, although there are still some issues to be resolved, is
nevertheless a good solution. We could also wait and see if anyone comes up with
a better one.

Thanks.
.

Reply via email to