** Description changed: [impact] When running heavy IO operations on sysfs, the kernel will lock up and many hung tasks will be printed in dmesg. This issue is due to a bug in the kernfs driver and was fixed upstream through 4 commits: 06fb4736139f kernfs: change kernfs_rename_lock into a read-write lock. c9f2dfb7b59e kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info. 9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes. 393c3714081a kernfs: switch global kernfs_rwsem lock to per-fs lock I have backported these commits and also backported two other required commits that these commit depend on: - 44a41882575b kernfs: dont take i_lock on inode attr read. - f44a3ca1e533 kernfs: move struct kernfs_root out of the public view. + 44a41882575b kernfs: dont take i_lock on inode attr read. + f44a3ca1e533 kernfs: move struct kernfs_root out of the public view. 5 other commits are required as well that contain fixes for the above mentioned commits: - 0559f63057f9 kernfs: fix missing kernfs_iattr_rwsem locking - 72b5d5aef246 kernfs: fix potential NULL dereference in __kernfs_remove - ad8d869343ae kernfs: fix NULL dereferencing in kernfs_remove - f3a690227f07 kernfs: remove redundant kernfs_rwsem declaration. - 555a0ce4558d kernfs: prevent early freeing of root node + 0559f63057f9 kernfs: fix missing kernfs_iattr_rwsem locking + 72b5d5aef246 kernfs: fix potential NULL dereference in __kernfs_remove + ad8d869343ae kernfs: fix NULL dereferencing in kernfs_remove + f3a690227f07 kernfs: remove redundant kernfs_rwsem declaration. + 555a0ce4558d kernfs: prevent early freeing of root node - In total 111 commits are required. + In total 11 commits are required. One of the main signs of hitting this bug, is seeing a huge number of tasks in the Uninterruptible state when analyzing the core dump: crash> ps -S RU: 99 UN: 7725 IN: 6113 ID: 444 Stack traces can be very different based on the application locking up but what they all have in common is that they are waiting for a mutex to unlock: [308188.415026] INFO: task READACTED:15703 blocked for more than 360 seconds. [308188.415362] Not tainted 5.15.0-153-generic #163-Ubuntu [308188.415749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [308188.416054] task:REDACTED state:D stack: 0 pid:15703 ppid: 15160 flags:0x00000000 [308188.416347] Call Trace: [308188.416634] <TASK> [308188.416917] __schedule+0x24e/0x590 [308188.417202] schedule+0x69/0x110 [308188.417506] schedule_preempt_disabled+0xe/0x20 [308188.417789] __mutex_lock.constprop.0+0x267/0x490 [308188.418052] ? rtnl_getlink+0x420/0x420 [308188.418311] __mutex_lock_slowpath+0x13/0x20 [308188.418575] mutex_lock+0x38/0x50 [308188.418818] __netlink_dump_start+0xbf/0x2f0 [308188.419061] ? rtnl_getlink+0x420/0x420 [308188.419321] rtnetlink_rcv_msg+0x2af/0x400 [308188.419564] ? rtnl_getlink+0x420/0x420 [308188.419807] ? rtnl_calcit.isra.0+0x130/0x130 [308188.420051] netlink_rcv_skb+0x53/0x100 [308188.420297] rtnetlink_rcv+0x15/0x20 [308188.420539] netlink_unicast+0x220/0x340 [308188.420781] netlink_sendmsg+0x24b/0x4c0 [308188.421023] __sock_sendmsg+0x66/0x70 [308188.421262] __sys_sendto+0x113/0x190 [308188.421511] ? __audit_syscall_exit+0x269/0x2d0 [308188.421754] ? __audit_syscall_entry+0xde/0x120 [308188.422024] __x64_sys_sendto+0x24/0x30 [308188.422258] x64_sys_call+0x1bcb/0x1fa0 [308188.422485] do_syscall_64+0x56/0xb0 [308188.422711] entry_SYSCALL_64_after_hwframe+0x6c/0xd6 [308188.422946] RIP: 0033:0x48fbae [308188.423196] RSP: 002b:000000c0038b1480 EFLAGS: 00000216 ORIG_RAX: 000000000000002c [308188.423454] RAX: ffffffffffffffda RBX: 000000000000008a RCX: 000000000048fbae [308188.423694] RDX: 0000000000000011 RSI: 000000c0074b84e0 RDI: 000000000000008a [308188.423935] RBP: 000000c0038b14c0 R08: 000000c0074b84d4 R09: 000000000000000c [308188.424180] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c0074b84e0 [308188.424422] R13: 0000000000000000 R14: 000000c0039a2540 R15: 00000000000007c6 [308188.424667] </TASK> The issue currently affects Jammy 5.15 kernel and is not present in the 6.8 kernel or above. [Test Plan] Reproducing the issue requires a machine that is able to run a high number of applications concurrently that are hammering kernfs with I/O requests. commit "9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes" contains a C program snippet reproducer. Although in this loop the sysfs being accessed is for an infiniband device, having a any similar program that reads files from sysfs with multiple instances of the program running, will trigger the issue. An example could be reading data from "/sys/class/net/<interface>/statistics/*" instead. In other words running multiple instances of a program that does IO operations on sysfs would confirm if the issue remains or is resolved. To reliably reproduce or confirm the resolution of the issue, dozens if not hundreds of instances should be executed. [ Where problems could occur ] * Since the fix changes the semaphore being used to access kernfs files from one that protects an entire kernfs filesystem to one that protects individual inodes, there is a chance that this might cause race conditions on read/write. * This might also cause deadlocks similar to the ones this bug is reporting.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2125142 Title: Hung task when heavily accessing kernfs files Status in linux package in Ubuntu: New Status in linux source package in Jammy: New Bug description: [impact] When running heavy IO operations on sysfs, the kernel will lock up and many hung tasks will be printed in dmesg. This issue is due to a bug in the kernfs driver and was fixed upstream through 4 commits: 06fb4736139f kernfs: change kernfs_rename_lock into a read-write lock. c9f2dfb7b59e kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info. 9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes. 393c3714081a kernfs: switch global kernfs_rwsem lock to per-fs lock I have backported these commits and also backported two other required commits that these commit depend on: 44a41882575b kernfs: dont take i_lock on inode attr read. f44a3ca1e533 kernfs: move struct kernfs_root out of the public view. 5 other commits are required as well that contain fixes for the above mentioned commits: 0559f63057f9 kernfs: fix missing kernfs_iattr_rwsem locking 72b5d5aef246 kernfs: fix potential NULL dereference in __kernfs_remove ad8d869343ae kernfs: fix NULL dereferencing in kernfs_remove f3a690227f07 kernfs: remove redundant kernfs_rwsem declaration. 555a0ce4558d kernfs: prevent early freeing of root node In total 11 commits are required. One of the main signs of hitting this bug, is seeing a huge number of tasks in the Uninterruptible state when analyzing the core dump: crash> ps -S RU: 99 UN: 7725 IN: 6113 ID: 444 Stack traces can be very different based on the application locking up but what they all have in common is that they are waiting for a mutex to unlock: [308188.415026] INFO: task READACTED:15703 blocked for more than 360 seconds. [308188.415362] Not tainted 5.15.0-153-generic #163-Ubuntu [308188.415749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [308188.416054] task:REDACTED state:D stack: 0 pid:15703 ppid: 15160 flags:0x00000000 [308188.416347] Call Trace: [308188.416634] <TASK> [308188.416917] __schedule+0x24e/0x590 [308188.417202] schedule+0x69/0x110 [308188.417506] schedule_preempt_disabled+0xe/0x20 [308188.417789] __mutex_lock.constprop.0+0x267/0x490 [308188.418052] ? rtnl_getlink+0x420/0x420 [308188.418311] __mutex_lock_slowpath+0x13/0x20 [308188.418575] mutex_lock+0x38/0x50 [308188.418818] __netlink_dump_start+0xbf/0x2f0 [308188.419061] ? rtnl_getlink+0x420/0x420 [308188.419321] rtnetlink_rcv_msg+0x2af/0x400 [308188.419564] ? rtnl_getlink+0x420/0x420 [308188.419807] ? rtnl_calcit.isra.0+0x130/0x130 [308188.420051] netlink_rcv_skb+0x53/0x100 [308188.420297] rtnetlink_rcv+0x15/0x20 [308188.420539] netlink_unicast+0x220/0x340 [308188.420781] netlink_sendmsg+0x24b/0x4c0 [308188.421023] __sock_sendmsg+0x66/0x70 [308188.421262] __sys_sendto+0x113/0x190 [308188.421511] ? __audit_syscall_exit+0x269/0x2d0 [308188.421754] ? __audit_syscall_entry+0xde/0x120 [308188.422024] __x64_sys_sendto+0x24/0x30 [308188.422258] x64_sys_call+0x1bcb/0x1fa0 [308188.422485] do_syscall_64+0x56/0xb0 [308188.422711] entry_SYSCALL_64_after_hwframe+0x6c/0xd6 [308188.422946] RIP: 0033:0x48fbae [308188.423196] RSP: 002b:000000c0038b1480 EFLAGS: 00000216 ORIG_RAX: 000000000000002c [308188.423454] RAX: ffffffffffffffda RBX: 000000000000008a RCX: 000000000048fbae [308188.423694] RDX: 0000000000000011 RSI: 000000c0074b84e0 RDI: 000000000000008a [308188.423935] RBP: 000000c0038b14c0 R08: 000000c0074b84d4 R09: 000000000000000c [308188.424180] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c0074b84e0 [308188.424422] R13: 0000000000000000 R14: 000000c0039a2540 R15: 00000000000007c6 [308188.424667] </TASK> The issue currently affects Jammy 5.15 kernel and is not present in the 6.8 kernel or above. [Test Plan] Reproducing the issue requires a machine that is able to run a high number of applications concurrently that are hammering kernfs with I/O requests. commit "9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes" contains a C program snippet reproducer. Although in this loop the sysfs being accessed is for an infiniband device, having a any similar program that reads files from sysfs with multiple instances of the program running, will trigger the issue. An example could be reading data from "/sys/class/net/<interface>/statistics/*" instead. In other words running multiple instances of a program that does IO operations on sysfs would confirm if the issue remains or is resolved. To reliably reproduce or confirm the resolution of the issue, dozens if not hundreds of instances should be executed. [ Where problems could occur ] * Since the fix changes the semaphore being used to access kernfs files from one that protects an entire kernfs filesystem to one that protects individual inodes, there is a chance that this might cause race conditions on read/write. * This might also cause deadlocks similar to the ones this bug is reporting. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2125142/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

