** Changed in: linux (Ubuntu Jammy) Status: In Progress => Fix Committed
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2089373 Title: WARN in trc_wait_for_one_reader about failed IPIs Status in linux package in Ubuntu: Invalid Status in linux source package in Jammy: Fix Committed Bug description: [Impact] When ending bpf tracing, 5.15 kernels now report a warning in trc_wait_for_one_reader() on platforms that support hot-plugging CPUs, but that do not have all of their hotplug slots populated. In this submitter's environment, it reproduces on Xen EC2 instances, but not Nitro ones. The warning looks like this: kernel: [ 6416.920266] ------------[ cut here ]------------ kernel: [ 6416.920272] trc_wait_for_one_reader(): smp_call_function_single() failed for CPU: 64 kernel: [ 6416.920289] WARNING: CPU: 0 PID: 13 at kernel/rcu/tasks.h:1044 trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920299] Modules linked in: xt_state xt_connmark nf_conntrack_netlink nfnetlink xt_addrtype xt_statistic xt_nat xt_tcpudp ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nvidia_uvm(POE) nvidia_drm(POE) drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt nvidia_modeset(POE) nvidia(POE) iptable_mangle ip6table_mangle ip6table_filter ip6table_nat ip6_tables xt_MASQUERADE xt_conntrack xt_comment iptable_filter xt_mark iptable_nat nf_nat bpfilter aufs overlay udp_diag tcp_diag inet_diag binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel input_leds psmouse crypto_simd cryptd serio_raw floppy sch_fq_codel nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ena drm efi_pstore ip_tables x_tables autofs4 kernel: [ 6416.920368] CPU: 0 PID: 13 Comm: rcu_tasks_trace Tainted: P OE 5.15.0-1071-aws #77~20.04.1-Ubuntu kernel: [ 6416.920372] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 kernel: [ 6416.920374] RIP: 0010:trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920376] Code: 00 00 00 4c 89 ef e8 37 ac 4e 00 eb 9f 44 89 fa 48 c7 c6 00 63 e2 b8 48 c7 c7 a0 9a 1e b9 c6 05 2f 2e 09 02 01 e8 15 2e b9 00 <0f> 0b e9 31 ff ff ff 4c 89 ee 48 c7 c7 20 df b7 b9 e8 a2 99 52 00 kernel: [ 6416.920380] RSP: 0018:ffff9e048c4efe00 EFLAGS: 00010286 kernel: [ 6416.920382] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 kernel: [ 6416.920384] RDX: 0000000000000027 RSI: 0000000000000003 RDI: ffff93074ae20588 kernel: [ 6416.920385] RBP: ffff9e048c4efe28 R08: ffff93074ae20580 R09: 0000000000000001 kernel: [ 6416.920387] R10: 0000000000ffff0a R11: ffff93463feb2c7f R12: ffff92cbc6a1e600 kernel: [ 6416.920389] R13: 0000000000000040 R14: 00000000000205a4 R15: 0000000000000040 kernel: [ 6416.920390] FS: 0000000000000000(0000) GS:ffff93074ae00000(0000) knlGS:0000000000000000 kernel: [ 6416.920393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: [ 6416.920394] CR2: 00007f4a72b04098 CR3: 00000046c8964001 CR4: 00000000001706f0 kernel: [ 6416.920399] Call Trace: kernel: [ 6416.920401] <TASK> kernel: [ 6416.920404] ? show_regs.cold+0x1a/0x1f kernel: [ 6416.920410] ? trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920412] ? __warn+0x8b/0xe0 kernel: [ 6416.920418] ? trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920421] ? report_bug+0xd5/0x110 kernel: [ 6416.920427] ? handle_bug+0x39/0x90 kernel: [ 6416.920431] ? exc_invalid_op+0x19/0x70 kernel: [ 6416.920434] ? asm_exc_invalid_op+0x1b/0x20 kernel: [ 6416.920442] ? trc_wait_for_one_reader+0x2b8/0x300 kernel: [ 6416.920446] rcu_tasks_trace_postscan+0x47/0x80 kernel: [ 6416.920449] rcu_tasks_wait_gp+0x108/0x210 kernel: [ 6416.920453] rcu_tasks_kthread+0x10f/0x1c0 kernel: [ 6416.920456] ? wait_woken+0x60/0x60 kernel: [ 6416.920462] ? show_rcu_tasks_trace_gp_kthread+0x80/0x80 kernel: [ 6416.920464] kthread+0x12a/0x150 kernel: [ 6416.920471] ? set_kthread_struct+0x50/0x50 kernel: [ 6416.920476] ret_from_fork+0x22/0x30 kernel: [ 6416.920485] </TASK> kernel: [ 6416.920486] ---[ end trace 0500611ddaff33a7 ]--- The problem appears when: - The system is performing a rcu_tasks_trace grace period wait - The system has more hot plug CPU slots available than are populated - The rcu tasks postscan detects a holdout The problem is actually caused by a mismerge of 9b3c4ab304("sched,rcu: Rework try_invoke_on_locked_down_task()"). When that patch was applied, a conflict around task nesting was improperly resolved and lead to quiescent tasks getting flagged as holdouts. This in turn results in more IPIs than necessary to idle CPUs, as well as WARNs about failing to send IPIs to CPUs that aren't running. The fix is a twofer: 1) manually correct the mismerge in the same way that mainline resolved the conflict, and 2) backport an additional RCU patch that confines the rcu_tasks postscan to only CPUs that are running. [Backport] The upstream merge that shows the correct manual resolution of the merge conflicts is in this commit: commit 6fedc28076bbbb32edb722e80f9406a3d1d668a8 Merge tag 'rcu.2021.11.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu specifically: > @@ -951,18 +942,18 @@ static int trc_inspect_reader(struct task_struct *t, void *arg) > n_heavy_reader_updates++; > if (ofl) > n_heavy_reader_ofl_updates++; > - in_qs = true; > + nesting = 0; > } else { > // The task is not running, so C-language access is safe. > - in_qs = likely(!t->trc_reader_nesting); > + nesting = t->trc_reader_nesting; > } > > - // Mark as checked so that the grace-period kthread will > - // remove it from the holdout list. > - t->trc_reader_checked = true; > - > - if (in_qs) > - return 0; // Already in quiescent state, done!!! > + // If not exiting a read-side critical section, mark as checked > + // so that the grace-period kthread will remove it from the > + // holdout list. > + t->trc_reader_checked = nesting >= 0; > + if (nesting <= 0) > + return nesting ? -EINVAL : 0; // If in QS, done, otherwise try again later. The additional rcu_tasks patch for only running postscan on online cpus is: commit 5c9a9ca44fda41c5e82f50efced5297a9c19760d rcu-tasks: Idle tasks on offline CPUs are in quiescent I've additionally reached out to upstream about including this in stable: https://lore.kernel.org/stable/c56243da5c8b4451097b39468166248790f9a1de.1732237776.git.k...@templeofstupid.com/T/#t [Test] A trivial reproducer for this problem is to use an up-to-date version of bpftrace to run a kfunc probe, which when destroyed uses the rcu_tasks_trace facility to cleanup: bpftrace -e 'kfunc:tcp_reset {@a = count();}' ^C Is all that's necessary to reproduce the problem on a Xen EC2 system. I've run with and without the patches applied and can confirm that one and both are sufficient to resolve the problem. Correcting the nesting ensures that idling cpus don't get flagged as holdouts, and confining the scan to just online cpus ensures that even if we incorrectly flag a cpu as a holdout the warning won't trigger because sending the IPI won't fail. [Potential Regression] The regression potential is low. The corrected commit has been present in mainline since 2021 and the fix to only run postscan on online CPUs has been present since 2022. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089373/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp