Verification for noble part 2: I built 6.8.0-80-generic + CONFIG_RCU_TORTURE_TEST=m in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf411904-config I have installed this kernel to a VM running on GCP. I loaded the rcutorture kernel module: $ sudo modprobe rcutorture The instance has been running for maybe 6 hours now, and things are still going great. kernel: rcu_torture_fwd_prog n_max_cbs: 40225 kernel: rcu_torture_fwd_prog: Starting forward-progress test 0 kernel: rcu_torture_fwd_prog_cr: Starting forward-progress test 0 kernel: rcu_torture_fwd_prog_cr: Waiting for CBs: rcu_barrier+0x0/0x80() 0 kernel: rcu_torture_fwd_prog_cr Duration 24 barrier: 26 pending 29818 n_launders: 19591 n_launders_sa: 3256 n_max_gps: 100 n_max_cbs: 39708 cver 2 gps 7 kernel: rcu_torture_fwd_cb_hist: Callback-invocation histogram 0 (duration 64 jiffies): 1s/10: 59299:10 kernel: rcu_torture_fwd_prog_nr: Starting forward-progress test 0 kernel: rcu-torture: rcu_torture_read_exit: Start of episode kernel: rcu-torture: rcu_torture_read_exit: End of episode kernel: rcu-torture: rcu_torture_read_exit: Start of episode kernel: rcu-torture: rcu_torture_read_exit: End of episode kernel: rcu-torture: rtc: 00000000207629d2 ver: 797622 tfle: 0 rta: 797622 rtaf: 0 rtf: 797613 rtmbe: 0 rtmbkf: 0/0 rtbe: 0 rtbke: 0 rtbf: 0 rtb: 0 nt: 2996728 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=1000) barrier: 0/0:0 read-exits: 20976 nocb-toggles: 0:0 kernel: rcu-torture: Reader Pipe: 1420914123 146774 0 0 0 0 0 0 0 0 0 kernel: rcu-torture: Reader Batch: 1419969223 1091674 0 0 0 0 0 0 0 0 0 kernel: rcu-torture: Free-Block Circulation: 797621 797621 797620 797619 797618 797617 797616 797615 797614 797613 0 I will leave this running for approximately a week, checking in frequently to ensure there are no rcu for srcu deadlocks. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2117123 Title: rcu: Eliminate deadlocks involving do_exit() and RCU tasks Status in linux package in Ubuntu: Fix Released Status in linux source package in Noble: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/2117123 [Impact] Tracing tools, such as ebpf fentry programs, can be attached to tasks all the way to very late in do_exit(), and because of this, synchronize_rcu_tasks() needs to wait for the dying task to finish and the tracer to be removed, even though the task is no longer on the task list. This is explained on: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting") > Once a task has passed exit_notify() in the do_exit() code path, it is no > longer on the task lists, and is therefore no longer visible to > rcu_tasks_kthread(). SRCU was created to handle this issue, to wait for tasks that could still be in a critical section, but no longer on the RCU tasks list. Unfortunately, there has been a class of deadlocks in do_exit() for years, that have been largely ignored, but was recently reproduced by a syzkaller script: https://github.com/xupengfe/syzkaller_logs/blob/main/221115_105658_synchronize_rcu/repro.c Frederic Weisbecker provides the following analysis: 1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace that every subsequent child of TASK A will belong to. But TASK A doesn't itself belong to that new PID namespace. 2) TASK A forks() and creates TASK B (it is a new threadgroup so it is a thread group leader). TASK A stays attached to its PID namespace (let's say PID_NS1) and TASK B is the first task belonging to the new PID namespace created by unshare() (let's call it PID_NS2). 3) Since TASK B is the first task attached to PID_NS2, it becomes the PID_NS2 child reaper. 4) TASK A forks() again and creates TASK C which get attached to PID_NS2. Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has TASK B (belonging to PID_NS2) as a pid_namespace child_reaper. 3) TASK B exits and since it is the child reaper for PID_NS2, it has to kill all other tasks attached to PID_NS2, and wait for all of them to die before reaping itself (zap_pid_ns_process()). Note it seems to make a misleading assumption here, trusting that all tasks in PID_NS2 either get reaped by a parent belonging to the same namespace or by TASK B. And it is confident that since it deactivated SIGCHLD handler, all the remaining tasks ultimately autoreap. And it waits for that to happen. However TASK C escapes that rule because it will get reaped by its parent TASK A belonging to PID_NS1. 4) TASK A calls synchronize_rcu_tasks() which leads to synchronize_srcu(&tasks_rcu_exit_srcu). 5) TASK B is waiting for TASK C to get reaped (wrongly assuming it autoreaps) But TASK B is under a tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A 6) TASK C exits and since TASK A is its parent, it waits for it to reap TASK C, but it can't because TASK A waits for TASK B that waits for TASK C. So there is a circular dependency: _ TASK A waits for TASK B to get out of tasks_rcu_exit_srcu SRCU critical section _ TASK B waits for TASK C to get reaped _ TASK C waits for TASK A to reap it. An example stack trace is: kernel: INFO: task rcu_tasks_trace:15 blocked for more than 121 seconds. kernel: Not tainted 6.8.0-63-generic #66-Ubuntu kernel: task:rcu_tasks_trace state:D stack:0 pid:15 tgid:15 ppid:2 flags:0x00004000 kernel: Call Trace: kernel: <TASK> kernel: __schedule+0x27c/0x6b0 kernel: schedule+0x33/0x110 kernel: schedule_timeout+0x157/0x170 kernel: wait_for_completion+0x88/0x150 kernel: __wait_rcu_gp+0x17e/0x190 kernel: synchronize_rcu+0x12d/0x140 kernel: ? __pfx_call_rcu_hurry+0x10/0x10 kernel: ? __pfx_wakeme_after_rcu+0x10/0x10 kernel: rcu_tasks_trace_postscan+0xe/0x20 kernel: rcu_tasks_wait_gp+0x119/0x310 kernel: ? _raw_spin_lock_irqsave+0xe/0x20 kernel: ? rcu_tasks_need_gpcb+0x1f7/0x350 kernel: ? __pfx_rcu_tasks_kthread+0x10/0x10 kernel: rcu_tasks_one_gp+0x122/0x150 kernel: rcu_tasks_kthread+0xa4/0xd0 kernel: kthread+0xef/0x120 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork+0x44/0x70 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork_asm+0x1b/0x30 kernel: </TASK> kernel: task:system-probe state:D stack:0 pid:1989 tgid:1931 ppid:1926 flags:0x00000002 kernel: Call Trace: kernel: <TASK> kernel: __schedule+0x27c/0x6b0 kernel: schedule+0x33/0x110 kernel: schedule_timeout+0x157/0x170 kernel: wait_for_completion+0x88/0x150 kernel: __wait_rcu_gp+0x17e/0x190 kernel: synchronize_rcu_tasks_generic+0x64/0xe0 kernel: ? __pfx_call_rcu_tasks_trace+0x10/0x10 kernel: ? __pfx_wakeme_after_rcu+0x10/0x10 kernel: synchronize_rcu_tasks_trace+0x15/0x20 kernel: perf_event_detach_bpf_prog+0x7d/0xe0 kernel: _free_event+0x20e/0x2a0 kernel: perf_event_release_kernel+0x281/0x2e0 kernel: perf_release+0x15/0x30 kernel: __fput+0xa0/0x2e0 kernel: __fput_sync+0x1c/0x30 kernel: __x64_sys_close+0x3e/0x90 kernel: x64_sys_call+0x1fec/0x25a0 kernel: do_syscall_64+0x7f/0x180 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? filp_flush+0x57/0x90 kernel: ? syscall_exit_to_user_mode+0x86/0x260 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? restore_fpregs_from_fpstate+0x3d/0xd0 kernel: ? switch_fpu_return+0x55/0xf0 kernel: ? filp_flush+0x57/0x90 kernel: ? syscall_exit_to_user_mode+0x86/0x260 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? filp_flush+0x57/0x90 kernel: ? syscall_exit_to_user_mode+0x86/0x260 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? do_syscall_64+0x8c/0x180 kernel: ? irqentry_exit_to_user_mode+0x7b/0x260 kernel: ? irqentry_exit+0x43/0x50 kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80 [Fix] The entire patchset is listed below. 3 out of the 7 have already been applied to ubuntu-noble due to being a dependency of another commit. We only need the 4 missing commits. This was mainlined in 6.9-rc1 by the following commits: commit 2eb52fa8900e642b3b5054c4bf9776089d2a935f Author: Paul E. McKenney <[email protected]> Date: Mon Dec 4 09:33:29 2023 -0800 Subject: rcu-tasks: Repair RCU Tasks Trace quiescence check Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2eb52fa8900e642b3b5054c4bf9776089d2a935f Applied: Yes. ubuntu-noble 7e16c7d2a1ee commit bfe93930ea1ea3c6c115a7d44af6e4fea609067e Author: Paul E. McKenney <[email protected]> Date: Mon Feb 5 13:08:22 2024 -0800 Subject: rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfe93930ea1ea3c6c115a7d44af6e4fea609067e Applied: Yes. ubuntu-noble b9014deb33e6 commit 30ef09635b9ed3ebca4f677495332a2e444a5cda Author: Paul E. McKenney <[email protected]> Date: Thu Feb 22 12:29:54 2024 -0800 Subject: rcu-tasks: Initialize callback lists at rcu_init() time Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=30ef09635b9ed3ebca4f677495332a2e444a5cda Applied. No. Needed. commit 46faf9d8e1d52e4a91c382c6c72da6bd8e68297b Author: Paul E. McKenney <[email protected]> Date: Mon Feb 5 13:10:19 2024 -0800 Subject: rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=46faf9d8e1d52e4a91c382c6c72da6bd8e68297b Applied: Yes. ubuntu-noble c8da4b0160db commit 6b70399f9ef3809f6e308fd99dd78b072c1bd05c Author: Paul E. McKenney <[email protected]> Date: Fri Feb 2 11:28:45 2024 -0800 Subject: rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6b70399f9ef3809f6e308fd99dd78b072c1bd05c Applied: No. Needed. commit 1612160b91272f5b1596f499584d6064bf5be794 Author: Paul E. McKenney <[email protected]> Date: Fri Feb 2 11:49:06 2024 -0800 Subject: rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1612160b91272f5b1596f499584d6064bf5be794 Applied: No. Needed. commit 0bb11a372fc8d7006b4d0f42a2882939747bdbff Author: Paul E. McKenney <[email protected]> Date: Thu Feb 1 06:10:26 2024 -0800 Subject: rcu-tasks: Maintain real-time response in rcu_tasks_postscan() Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0bb11a372fc8d7006b4d0f42a2882939747bdbff Applied: No. Needed. The 4 needed commits are all clean cherry picks. [Testcase] To reproduce the do_exit() deadlock using the syzkaller repro: $ sudo apt install build-essential $ wget https://raw.githubusercontent.com/xupengfe/syzkaller_logs/refs/heads/main/221115_105658_synchronize_rcu/repro.c $ gcc -o repro repro.c $ sudo ./repro $ journalctl -f -t kernel Due to the high regression risk of this patchset, we should run rcutorture, the rcu test suite, over a patched kernel to ensure there are no deadlocks. To run rcutorture on the kernel build: Documentation: https://docs.kernel.org/RCU/torture.html 1) Clone the kernel source code 2) Save the following patch to enable CONFIG_RCU_TORTURE_TEST to 0001-UBUNTU-Config-Enable-CONFIG_RCU_TORTURE_TEST.patch https://launchpadlibrarian.net/805611005/0001-UBUNTU-Config-Enable-CONFIG_RCU_TORTURE_TEST.patch 3) $ git am 0001-UBUNTU-Config-Enable-CONFIG_RCU_TORTURE_TEST.patch 4) Build a new kernel with the patch applied, boot into it. 5) $ modprobe rcutorture 6) Follow dmesg. $ journalctl -f -t kernel kernel: rcu-torture: rcu_torture_read_exit: Start of episode kernel: rcu-torture: rcu_torture_read_exit: End of episode kernel: rcu_torture_fwd_prog_nr: 0 Duration 50060 cver 1081 gps 1490 kernel: rcu_torture_fwd_prog_nr: Waiting for CBs: rcu_barrier+0x0/0x80() 0 kernel: rcu-torture: rtc: 00000000c099ebf1 ver: 62341 tfle: 0 rta: 62342 rtaf: 0 rtf: 62331 rtmbe: 0 rtmbkf: 0/48597 rtbe: 0 rtbke: 0 rtbf: 0 rtb: 0 nt: 1396993 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=1000) barrier: 0/0:0 read-exits: 1792 nocb-toggles: 0:0 kernel: rcu-torture: Reader Pipe: 2350715188 99444 0 0 0 0 0 0 0 0 0 kernel: rcu-torture: Reader Batch: 2350551525 263107 0 0 0 0 0 0 0 0 0 kernel: rcu-torture: Free-Block Circulation: 62341 62340 62339 62338 62336 62335 62334 62333 62332 62331 0 Read the documentation and ensure you see "Success" and no "FAILURE" messages. Ensure all the values that should be 0 are indeed 0. Leave rcutorture running for several hours / days. There is a test kernel available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf411904-config If you install it, it should not deadlock on the reproducer anymore, and you can also load the rcutorture kernel module for regression testing. [Where problems could occur] We are changing what happens to tasks that are late in do_exit(), and are now adding them to a new list to keep track of them while they could be in a RCU critical section. These are some large changes to the RCU subsystem, and it affects nearly other subsystem of the kernel, as RCU is used everywhere. If a regression were to occur, it would involve RCU grace periods getting stuck, leading to deadlocks and hung task timeouts with no real workarounds. We need to ensure we test this change with rcutorture for the whole duration the kernel is in -proposed for. [Other info] Upstream mailing list report: https://lore.kernel.org/lkml/[email protected]/T/#u Paul E. McKenney's architecture document: https://docs.google.com/document/d/1hJxgiZ5TMZ4YJkdJPLAkRvq7sYQ-A7svgA8no6i-v8k/edit?usp=sharing syzkaller scripts, C reproducer, dmesg logs: https://github.com/xupengfe/syzkaller_logs/tree/main/221115_105658_synchronize_rcu Upstream mailing list submission: https://lore.kernel.org/lkml/[email protected]/T/#u To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2117123/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

