Hi Kai, >Some nits in spell/grammar below. Sorry I didn't go through any spell/grammar >check last night since it was already very late for me. > >Btw, the v4 is quicker than I expected anyway. We are still in merge window so >maintainers are not expected to review this. For more info, see "Merge window" >section in Documentation/process/maintainer-tip.rst. > >On Wed, 2026-06-24 at 22:20 +0800, Jun Miao wrote: >> The kernel resets all EPC pages to a clean state in a loop before >> using them for enclaves. The number of EPC pages could be large >> (e.g., GBs) thus resetting them could take a fair amount of time. >> Because of that, during > >s/fair/significant > >> early boot, the kernel resets EPC pages through a kernel thread >> ksgxd() and there's a cond_resched() after resetting each EPC page. >> >> This is fine in most cases, but becomes a problem when there's other >> kernel code waiting for RCU-Tasks grace period but the cond_resched() >> in ksgxd() never triggers rescheduling. Because cond_resched() >> doesn't report quiescent state when it doesn't trigger rescheduling, >> the thread that is waiting for RCU-Tasks grace period will need to wait >> until all >EPC pages are reset. >> >> For instance, BPF LSM subsystem can invoke synchronize_rcu_tasks() at >> kernel boot time. A VM with a large EPC assigned and have BPF LSM >> enabled can take > ^ >Remove the 'have'. > >> a long time to boot, with a call trace triggered: >> >> rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 130631 >jiffies old. >> INFO: task systemd:1 blocked for more than 122 seconds. >> ... >> task:systemd state:D stack:0 pid:1 tpid:1 ppid:0 flags:0x00000002 >> Call Trace: >> ... >> schedule_timeout+0x157/0x170 >> wait_for_completion+0x88/0x150 >> __wait_rcu_gp+0x17e/0x190 >> synchronize_rcu_tasks_generic+0x64/0x60 >> ... >> synchronize_rcu_tasks+0x15/0x20 >> register_ftrace_direct+0x31f/0x350 >> ... >> bpf_trampoline_link_prog+0x33/0x60 >> bpf_tracing_prog_attach+0x3c5/0x5f0 >> >> Replace cond_resched() with cond_resched_tasks_rcu_qs() which >> explicitly report quiescent > >s/report quiescent/reports quiescent state > >> regardless whether actual rescheduling is triggered. Resetting all >> EPC pages in ksgxd() > >s/regardless/regardless of > >> isn't performance critical so the extra cost of cond_resched_tasks_rcu_qs() >> isn't >a problem. >> >> Tests showed this reduced the VM kernel boot time from ~50s to ~700ms. >> >> Reported-by: Challvy Tee <[email protected]> >> Link: https://github.com/systemd/systemd/issues/40423 >> Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page >> Cache (EPC) sections") >> Tested-by: Challvy Tee <[email protected]> >> Suggested-by: Kai Huang <[email protected]> >> Co-developed-by: Fan Du <[email protected]> >> Signed-off-by: Fan Du <[email protected]> >> Signed-off-by: Jun Miao <[email protected]> > > >Btw, English isn't my first language, so feel free to use AI to improve the >above >changelog if you want (I just did but not sure the changes are all better >though).
Thank you for your detailed guidance. I was still a bit careless. Got it all your review, I will review commit log through AI again before push V5 to make it more perfect. >Anyway, feel free to add: > >Reviewed-by: Kai Huang <[email protected]> It is my pleasure :-) --- Warm regards Jun Miao

