This is the same commit which is discussed in
https://github.com/nodejs/node/issues/55587 and
https://lore.kernel.org/io-
uring/3d913aef-8c44-4f50-9bdf-7d9051b08...@app.fastmail.com/T/#u. This
was fixed in 6.6.60 with
https://github.com/gregkh/linux/commit/6a91a5816b289018e0b42a25444c0b4f8c637dca,
and I think Ubuntu needs to backport the same fix.

** Bug watch added: github.com/nodejs/node/issues #55587
   https://github.com/nodejs/node/issues/55587

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2105471

Title:
  io_uring process deadlock

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  QEMU processes stuck on io_uring lock in Ubuntu 24.04, on kernel
  6.8.0-56.

  Since 2 weeks, im migrating more hosts towards ubuntu 24.04, coming
  from 22.04. Since then I notice the occasional VM that gets stuck in
  proc D state. dmesg then shows the same Call Trace as pasted below.

  On ubuntu 22.04 I was running the hwe package with kernel versions 6.5
  and 6.8, although I wasn't running 6.8 as much as I am doing now.

  I did find a locking patch in the 6.8.0-56 changelog and was wondering if 
that could be the cause:
  +
  +             /*
  +              * For silly syzbot cases that deliberately overflow by huge
  +              * amounts, check if we need to resched and drop and
  +              * reacquire the locks if so. Nothing real would ever hit this.
  +              * Ideally we'd have a non-posting unlock for this, but hard
  +              * to care for a non-real case.
  +              */
  +             if (need_resched()) {
  +                     io_cq_unlock_post(ctx);
  +                     mutex_unlock(&ctx->uring_lock);
  +                     cond_resched();
  +                     mutex_lock(&ctx->uring_lock);
  +                     io_cq_lock(ctx);
  +             }

  /proc/cmdline: BOOT_IMAGE=/boot/vmlinuz-6.8.0-56-generic
  root=/dev/mapper/hv9-root ro verbose security=apparmor rootdelay=10
  max_loop=16 default_hugepagesz=1G hugepagesz=1G hugepages=448
  libata.force=noncq iommu=pt
  crashkernel=512M-4G:128M,4G-8G:256M,8G-:512M

  
  dmesg snippet:
  [Thu Mar 27 18:50:48 2025] INFO: task qemu-system-x86:15480 blocked for more 
than 552 seconds.
  [Thu Mar 27 18:50:48 2025]       Tainted: G           OE      
6.8.0-56-generic #58-Ubuntu
  [Thu Mar 27 18:50:48 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
  [Thu Mar 27 18:50:48 2025] task:qemu-system-x86 state:D stack:0     pid:15480 
tgid:15480 ppid:1      flags:0x00024006
  [Thu Mar 27 18:50:48 2025] Call Trace:
  [Thu Mar 27 18:50:48 2025]  <TASK>
  [Thu Mar 27 18:50:48 2025]  __schedule+0x27c/0x6b0
  [Thu Mar 27 18:50:48 2025]  schedule+0x33/0x110
  [Thu Mar 27 18:50:48 2025]  schedule_preempt_disabled+0x15/0x30
  [Thu Mar 27 18:50:48 2025]  __mutex_lock.constprop.0+0x42f/0x740
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  __mutex_lock_slowpath+0x13/0x20
  [Thu Mar 27 18:50:48 2025]  mutex_lock+0x3c/0x50
  [Thu Mar 27 18:50:48 2025]  __do_sys_io_uring_enter+0x2e7/0x4a0
  [Thu Mar 27 18:50:48 2025]  __x64_sys_io_uring_enter+0x22/0x40
  [Thu Mar 27 18:50:48 2025]  x64_sys_call+0xeda/0x25a0
  [Thu Mar 27 18:50:48 2025]  do_syscall_64+0x7f/0x180
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? syscall_exit_to_user_mode+0x86/0x260
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? do_syscall_64+0x8c/0x180
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? syscall_exit_to_user_mode+0x86/0x260
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? do_syscall_64+0x8c/0x180
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? __x64_sys_ioctl+0xbb/0xf0
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? __x64_sys_ioctl+0xbb/0xf0
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? syscall_exit_to_user_mode+0x86/0x260
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? __x64_sys_ioctl+0xbb/0xf0
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? syscall_exit_to_user_mode+0x86/0x260
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  ? do_syscall_64+0x8c/0x180
  [Thu Mar 27 18:50:48 2025]  ? irqentry_exit+0x43/0x50
  [Thu Mar 27 18:50:48 2025]  ? srso_return_thunk+0x5/0x5f
  [Thu Mar 27 18:50:48 2025]  entry_SYSCALL_64_after_hwframe+0x78/0x80

  At this moment I have not tried to reproduce this yet, I can try
  running fio on a test host with the same kernel to see if I can
  consistently break it.

  I also have a crash dump that I made of one of the hosts.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2105471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to