** Changed in: linux (Ubuntu Jammy)
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2089373

Title:
  WARN in trc_wait_for_one_reader about failed IPIs

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  [Impact]

  When ending bpf tracing, 5.15 kernels now report a warning in
  trc_wait_for_one_reader() on platforms that support hot-plugging CPUs,
  but that do not have all of their hotplug slots populated.  In this
  submitter's environment, it reproduces on Xen EC2 instances, but not
  Nitro ones.

  The warning looks like this:

  kernel: [ 6416.920266] ------------[ cut here ]------------
  kernel: [ 6416.920272] trc_wait_for_one_reader(): smp_call_function_single() 
failed for CPU: 64
  kernel: [ 6416.920289] WARNING: CPU: 0 PID: 13 at kernel/rcu/tasks.h:1044 
trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920299] Modules linked in: xt_state xt_connmark 
nf_conntrack_netlink nfnetlink xt_addrtype xt_statistic xt_nat xt_tcpudp 
ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nvidia_uvm(POE) nvidia_drm(POE) 
drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt 
nvidia_modeset(POE) nvidia(POE) iptable_mangle ip6table_mangle ip6table_filter 
ip6table_nat ip6_tables xt_MASQUERADE xt_conntrack xt_comment iptable_filter 
xt_mark iptable_nat nf_nat bpfilter aufs overlay udp_diag tcp_diag inet_diag 
binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha1_ssse3 
aesni_intel input_leds psmouse crypto_simd cryptd serio_raw floppy sch_fq_codel 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ena drm efi_pstore 
ip_tables x_tables autofs4
  kernel: [ 6416.920368] CPU: 0 PID: 13 Comm: rcu_tasks_trace Tainted: P OE 
5.15.0-1071-aws #77~20.04.1-Ubuntu
  kernel: [ 6416.920372] Hardware name: Xen HVM domU, BIOS 4.11.amazon 
08/24/2006
  kernel: [ 6416.920374] RIP: 0010:trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920376] Code: 00 00 00 4c 89 ef e8 37 ac 4e 00 eb 9f 44 89 fa 
48 c7 c6 00 63 e2 b8 48 c7 c7 a0 9a 1e b9 c6 05 2f 2e 09 02 01 e8 15 2e b9 00 
<0f> 0b e9 31 ff ff ff 4c 89 ee 48 c7 c7 20 df b7 b9 e8 a2 99 52 00
  kernel: [ 6416.920380] RSP: 0018:ffff9e048c4efe00 EFLAGS: 00010286
  kernel: [ 6416.920382] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000027
  kernel: [ 6416.920384] RDX: 0000000000000027 RSI: 0000000000000003 RDI: 
ffff93074ae20588
  kernel: [ 6416.920385] RBP: ffff9e048c4efe28 R08: ffff93074ae20580 R09: 
0000000000000001
  kernel: [ 6416.920387] R10: 0000000000ffff0a R11: ffff93463feb2c7f R12: 
ffff92cbc6a1e600
  kernel: [ 6416.920389] R13: 0000000000000040 R14: 00000000000205a4 R15: 
0000000000000040
  kernel: [ 6416.920390] FS: 0000000000000000(0000) GS:ffff93074ae00000(0000) 
knlGS:0000000000000000
  kernel: [ 6416.920393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: [ 6416.920394] CR2: 00007f4a72b04098 CR3: 00000046c8964001 CR4: 
00000000001706f0
  kernel: [ 6416.920399] Call Trace:
  kernel: [ 6416.920401] <TASK>
  kernel: [ 6416.920404] ? show_regs.cold+0x1a/0x1f
  kernel: [ 6416.920410] ? trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920412] ? __warn+0x8b/0xe0
  kernel: [ 6416.920418] ? trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920421] ? report_bug+0xd5/0x110
  kernel: [ 6416.920427] ? handle_bug+0x39/0x90
  kernel: [ 6416.920431] ? exc_invalid_op+0x19/0x70
  kernel: [ 6416.920434] ? asm_exc_invalid_op+0x1b/0x20
  kernel: [ 6416.920442] ? trc_wait_for_one_reader+0x2b8/0x300
  kernel: [ 6416.920446] rcu_tasks_trace_postscan+0x47/0x80
  kernel: [ 6416.920449] rcu_tasks_wait_gp+0x108/0x210
  kernel: [ 6416.920453] rcu_tasks_kthread+0x10f/0x1c0
  kernel: [ 6416.920456] ? wait_woken+0x60/0x60
  kernel: [ 6416.920462] ? show_rcu_tasks_trace_gp_kthread+0x80/0x80
  kernel: [ 6416.920464] kthread+0x12a/0x150
  kernel: [ 6416.920471] ? set_kthread_struct+0x50/0x50
  kernel: [ 6416.920476] ret_from_fork+0x22/0x30
  kernel: [ 6416.920485] </TASK>
  kernel: [ 6416.920486] ---[ end trace 0500611ddaff33a7 ]---

  The problem appears when:

  - The system is performing a rcu_tasks_trace grace period wait
  - The system has more hot plug CPU slots available than are populated
  - The rcu tasks postscan detects a holdout

  The problem is actually caused by a mismerge of 9b3c4ab304("sched,rcu:
  Rework try_invoke_on_locked_down_task()").  When that patch was
  applied, a conflict around task nesting was improperly resolved and
  lead to quiescent tasks getting flagged as holdouts.  This in turn
  results in more IPIs than necessary to idle CPUs, as well as WARNs
  about failing to send IPIs to CPUs that aren't running.

  The fix is a twofer: 1) manually correct the mismerge in the same way
  that mainline resolved the conflict, and 2) backport an additional RCU
  patch that confines the rcu_tasks postscan to only CPUs that are
  running.

  [Backport]

  The upstream merge that shows the correct manual resolution of the
  merge conflicts is in this commit:

     commit 6fedc28076bbbb32edb722e80f9406a3d1d668a8
     Merge tag 'rcu.2021.11.01a' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

  specifically:

   > @@ -951,18 +942,18 @@ static int trc_inspect_reader(struct task_struct *t, 
void *arg)
   >            n_heavy_reader_updates++;
   >            if (ofl)
   >                    n_heavy_reader_ofl_updates++;
   > -          in_qs = true;
   > +          nesting = 0;
   >    } else {
   >            // The task is not running, so C-language access is safe.
   > -          in_qs = likely(!t->trc_reader_nesting);
   > +          nesting = t->trc_reader_nesting;
   >    }
   >  
   > -  // Mark as checked so that the grace-period kthread will
   > -  // remove it from the holdout list.
   > -  t->trc_reader_checked = true;
   > -
   > -  if (in_qs)
   > -          return 0;  // Already in quiescent state, done!!!
   > +  // If not exiting a read-side critical section, mark as checked
   > +  // so that the grace-period kthread will remove it from the
   > +  // holdout list.
   > +  t->trc_reader_checked = nesting >= 0;
   > +  if (nesting <= 0)
   > +          return nesting ? -EINVAL : 0;  // If in QS, done, otherwise try 
again later.

  The additional rcu_tasks patch for only running postscan on online
  cpus is:

     commit 5c9a9ca44fda41c5e82f50efced5297a9c19760d
     rcu-tasks: Idle tasks on offline CPUs are in quiescent

  I've additionally reached out to upstream about including this in
  stable:

  
https://lore.kernel.org/stable/c56243da5c8b4451097b39468166248790f9a1de.1732237776.git.k...@templeofstupid.com/T/#t

  [Test]

  A trivial reproducer for this problem is to use an up-to-date version
  of bpftrace to run a kfunc probe, which when destroyed uses the
  rcu_tasks_trace facility to cleanup:

     bpftrace -e 'kfunc:tcp_reset {@a = count();}'
     ^C

  Is all that's necessary to reproduce the problem on a Xen EC2 system.

  I've run with and without the patches applied and can confirm that one
  and both are sufficient to resolve the problem.  Correcting the
  nesting ensures that idling cpus don't get flagged as holdouts, and
  confining the scan to just online cpus ensures that even if we
  incorrectly flag a cpu as a holdout the warning won't trigger because
  sending the IPI won't fail.

  [Potential Regression]

  The regression potential is low.  The corrected commit has been
  present in mainline since 2021 and the fix to only run postscan on
  online CPUs has been present since 2022.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089373/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to