Hi Kai, >(Reminder: you forgot the [email protected]). > Ok, + CC linux-sgx in this reply.
>Could you move some context from your v1 and refine together with the above >two paragraphs? Okay, what about this commit description in v5? Subject: [PATCH v5] x86/sgx: Fix RCU Tasks stall in EPC sanitization loop During early boot, ksgxd (Intel Software Guard Extensions Kernel Thread) iterates over all post-kexec dirty EPC pages in a tight loop calling cond_resched() after each page. But, on isolated CPUs (a common configuration in cloud VMs), cond_resched() never triggers a real context switch because TIF_NEED_RESCHED is not set when no competing runnable task exists on that CPU. BPF LSM subsystem can invoke synchronize_rcu_tasks() at kernel boot time. ksgxd() can never be rescheduled() when doing sanitizing all EPC pages. As a result, a VM may take a long time to boot: [ 134.806157] rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 130631 jiffies old. [ 248.086158] INFO: task systemd:1 blocked for more than 122 seconds. [ 248.086491] Not tainted 6.8.0-90-generic #91-Ubuntu [ 248.086739] 'echo 0 > /proc/sys/kernel/hung_task_timeout_secs' disables this message. [ 248.086993] task:systemd state:D stack:0 pid:1 tpid:1 ppid:0 flags:0x00000002 [ 248.087274] Call Trace: ... [ 248.087939] schedule_timeout+0x157/0x170 [ 248.088120] wait_for_completion+0x88/0x150 [ 248.088304] __wait_rcu_gp+0x17e/0x190 [ 248.088481] synchronize_rcu_tasks_generic+0x64/0x60 ... [ 248.089047] synchronize_rcu_tasks+0x15/0x20 [ 248.089260] register_ftrace_direct+0x31f/0x350 ... [ 248.090339] bpf_trampoline_link_prog+0x33/0x60 [ 248.090518] bpf_tracing_prog_attach+0x3c5/0x5f0 ... After this patch test result: Tests showed using cond_resched_tasks_rcu_qs() reduced the boot time from ~50s to ~10.7s (systemd-analyze: 724ms kernel + 1.575s initrd + 8.481s userspace = 10.782s) [ kai: completely trim down/rewrite changelog ] Reported-by: Challvy Tee <[email protected]> Link: https://github.com/systemd/systemd/issues/40423 Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") Tested-by: Challvy Tee <[email protected]> Suggested-by: Kai Huang <[email protected]> Co-developed-by: Fan Du <[email protected]> Signed-off-by: Fan Du <[email protected]> Signed-off-by: Jun Miao <[email protected]> --- Warm regards Jun Miao

