Public bug reported: QEMU processes stuck on io_uring lock in Ubuntu 24.04, on kernel 6.8.0-56.
Since 2 weeks, im migrating more hosts towards ubuntu 24.04, coming from 22.04. Since then I notice the occasional VM that gets stuck in proc D state. dmesg then shows the same Call Trace as pasted below. On ubuntu 22.04 I was running the hwe package with kernel versions 6.5 and 6.8, although I wasn't running 6.8 as much as I am doing now. I did find a locking patch in the 6.8.0-56 changelog and was wondering if that could be the cause: + + /* + * For silly syzbot cases that deliberately overflow by huge + * amounts, check if we need to resched and drop and + * reacquire the locks if so. Nothing real would ever hit this. + * Ideally we'd have a non-posting unlock for this, but hard + * to care for a non-real case. + */ + if (need_resched()) { + io_cq_unlock_post(ctx); + mutex_unlock(&ctx->uring_lock); + cond_resched(); + mutex_lock(&ctx->uring_lock); + io_cq_lock(ctx); + } /proc/cmdline: BOOT_IMAGE=/boot/vmlinuz-6.8.0-56-generic root=/dev/mapper/hv9-root ro verbose security=apparmor rootdelay=10 max_loop=16 default_hugepagesz=1G hugepagesz=1G hugepages=448 libata.force=noncq iommu=pt crashkernel=512M-4G:128M,4G-8G:256M,8G-:512M dmesg snippet: [Thu Mar 27 18:50:48 2025] INFO: task qemu-system-x86:15480 blocked for more than 552 seconds. [Thu Mar 27 18:50:48 2025] Tainted: G OE 6.8.0-56-generic #58-Ubuntu [Thu Mar 27 18:50:48 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Mar 27 18:50:48 2025] task:qemu-system-x86 state:D stack:0 pid:15480 tgid:15480 ppid:1 flags:0x00024006 [Thu Mar 27 18:50:48 2025] Call Trace: [Thu Mar 27 18:50:48 2025] <TASK> [Thu Mar 27 18:50:48 2025] __schedule+0x27c/0x6b0 [Thu Mar 27 18:50:48 2025] schedule+0x33/0x110 [Thu Mar 27 18:50:48 2025] schedule_preempt_disabled+0x15/0x30 [Thu Mar 27 18:50:48 2025] __mutex_lock.constprop.0+0x42f/0x740 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] __mutex_lock_slowpath+0x13/0x20 [Thu Mar 27 18:50:48 2025] mutex_lock+0x3c/0x50 [Thu Mar 27 18:50:48 2025] __do_sys_io_uring_enter+0x2e7/0x4a0 [Thu Mar 27 18:50:48 2025] __x64_sys_io_uring_enter+0x22/0x40 [Thu Mar 27 18:50:48 2025] x64_sys_call+0xeda/0x25a0 [Thu Mar 27 18:50:48 2025] do_syscall_64+0x7f/0x180 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180 [Thu Mar 27 18:50:48 2025] ? irqentry_exit+0x43/0x50 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] entry_SYSCALL_64_after_hwframe+0x78/0x80 At this moment I have not tried to reproduce this yet, I can try running fio on a test host with the same kernel to see if I can consistently break it. I also have a crash dump that I made of one of the hosts. ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2105471 Title: io_uring process deadlock Status in linux package in Ubuntu: New Bug description: QEMU processes stuck on io_uring lock in Ubuntu 24.04, on kernel 6.8.0-56. Since 2 weeks, im migrating more hosts towards ubuntu 24.04, coming from 22.04. Since then I notice the occasional VM that gets stuck in proc D state. dmesg then shows the same Call Trace as pasted below. On ubuntu 22.04 I was running the hwe package with kernel versions 6.5 and 6.8, although I wasn't running 6.8 as much as I am doing now. I did find a locking patch in the 6.8.0-56 changelog and was wondering if that could be the cause: + + /* + * For silly syzbot cases that deliberately overflow by huge + * amounts, check if we need to resched and drop and + * reacquire the locks if so. Nothing real would ever hit this. + * Ideally we'd have a non-posting unlock for this, but hard + * to care for a non-real case. + */ + if (need_resched()) { + io_cq_unlock_post(ctx); + mutex_unlock(&ctx->uring_lock); + cond_resched(); + mutex_lock(&ctx->uring_lock); + io_cq_lock(ctx); + } /proc/cmdline: BOOT_IMAGE=/boot/vmlinuz-6.8.0-56-generic root=/dev/mapper/hv9-root ro verbose security=apparmor rootdelay=10 max_loop=16 default_hugepagesz=1G hugepagesz=1G hugepages=448 libata.force=noncq iommu=pt crashkernel=512M-4G:128M,4G-8G:256M,8G-:512M dmesg snippet: [Thu Mar 27 18:50:48 2025] INFO: task qemu-system-x86:15480 blocked for more than 552 seconds. [Thu Mar 27 18:50:48 2025] Tainted: G OE 6.8.0-56-generic #58-Ubuntu [Thu Mar 27 18:50:48 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Mar 27 18:50:48 2025] task:qemu-system-x86 state:D stack:0 pid:15480 tgid:15480 ppid:1 flags:0x00024006 [Thu Mar 27 18:50:48 2025] Call Trace: [Thu Mar 27 18:50:48 2025] <TASK> [Thu Mar 27 18:50:48 2025] __schedule+0x27c/0x6b0 [Thu Mar 27 18:50:48 2025] schedule+0x33/0x110 [Thu Mar 27 18:50:48 2025] schedule_preempt_disabled+0x15/0x30 [Thu Mar 27 18:50:48 2025] __mutex_lock.constprop.0+0x42f/0x740 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] __mutex_lock_slowpath+0x13/0x20 [Thu Mar 27 18:50:48 2025] mutex_lock+0x3c/0x50 [Thu Mar 27 18:50:48 2025] __do_sys_io_uring_enter+0x2e7/0x4a0 [Thu Mar 27 18:50:48 2025] __x64_sys_io_uring_enter+0x22/0x40 [Thu Mar 27 18:50:48 2025] x64_sys_call+0xeda/0x25a0 [Thu Mar 27 18:50:48 2025] do_syscall_64+0x7f/0x180 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180 [Thu Mar 27 18:50:48 2025] ? irqentry_exit+0x43/0x50 [Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f [Thu Mar 27 18:50:48 2025] entry_SYSCALL_64_after_hwframe+0x78/0x80 At this moment I have not tried to reproduce this yet, I can try running fio on a test host with the same kernel to see if I can consistently break it. I also have a crash dump that I made of one of the hosts. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2105471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp