[Kernel-packages] [Bug 2125142] Re: Hung task when heavily accessing kernfs files

Ghadi Rahme Mon, 22 Sep 2025 07:37:50 -0700

** Description changed:

  [impact]
  
  When running heavy IO operations on sysfs, the kernel will lock up and many 
hung tasks will be printed in dmesg.
  This issue is due to a bug in the kernfs driver and was fixed upstream 
through 4 commits:
  
      06fb4736139f kernfs: change kernfs_rename_lock into a read-write lock.
      c9f2dfb7b59e kernfs: Use a per-fs rwsem to protect per-fs list of 
kernfs_super_info.
      9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes.
      393c3714081a kernfs: switch global kernfs_rwsem lock to per-fs lock
  
  I have backported these commits and also backported two other required
  commits that these commit depend on:
  
- 44a41882575b kernfs: dont take i_lock on inode attr read.
- f44a3ca1e533 kernfs: move struct kernfs_root out of the public view.
+     44a41882575b kernfs: dont take i_lock on inode attr read.
+     f44a3ca1e533 kernfs: move struct kernfs_root out of the public view.
  
  5 other commits are required as well that contain fixes for the above
  mentioned commits:
  
- 0559f63057f9 kernfs: fix missing kernfs_iattr_rwsem locking
- 72b5d5aef246 kernfs: fix potential NULL dereference in __kernfs_remove
- ad8d869343ae kernfs: fix NULL dereferencing in kernfs_remove
- f3a690227f07 kernfs: remove redundant kernfs_rwsem declaration.
- 555a0ce4558d kernfs: prevent early freeing of root node
+     0559f63057f9 kernfs: fix missing kernfs_iattr_rwsem locking
+     72b5d5aef246 kernfs: fix potential NULL dereference in __kernfs_remove
+     ad8d869343ae kernfs: fix NULL dereferencing in kernfs_remove
+     f3a690227f07 kernfs: remove redundant kernfs_rwsem declaration.
+     555a0ce4558d kernfs: prevent early freeing of root node
  
- In total 111 commits are required.
+ In total 11 commits are required.
  
  One of the main signs of hitting this bug, is seeing a huge number of
  tasks in the Uninterruptible state when analyzing the core dump:
  
  crash> ps -S
    RU: 99
    UN: 7725
    IN: 6113
    ID: 444
  
  Stack traces can be very different based on the application locking up
  but what they all have in common is that they are waiting for a mutex to
  unlock:
  
  [308188.415026] INFO: task READACTED:15703 blocked for more than 360 seconds.
  [308188.415362]       Not tainted 5.15.0-153-generic #163-Ubuntu
  [308188.415749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [308188.416054] task:REDACTED     state:D stack:    0 pid:15703 ppid: 15160 
flags:0x00000000
  [308188.416347] Call Trace:
  [308188.416634]  <TASK>
  [308188.416917]  __schedule+0x24e/0x590
  [308188.417202]  schedule+0x69/0x110
  [308188.417506]  schedule_preempt_disabled+0xe/0x20
  [308188.417789]  __mutex_lock.constprop.0+0x267/0x490
  [308188.418052]  ? rtnl_getlink+0x420/0x420
  [308188.418311]  __mutex_lock_slowpath+0x13/0x20
  [308188.418575]  mutex_lock+0x38/0x50
  [308188.418818]  __netlink_dump_start+0xbf/0x2f0
  [308188.419061]  ? rtnl_getlink+0x420/0x420
  [308188.419321]  rtnetlink_rcv_msg+0x2af/0x400
  [308188.419564]  ? rtnl_getlink+0x420/0x420
  [308188.419807]  ? rtnl_calcit.isra.0+0x130/0x130
  [308188.420051]  netlink_rcv_skb+0x53/0x100
  [308188.420297]  rtnetlink_rcv+0x15/0x20
  [308188.420539]  netlink_unicast+0x220/0x340
  [308188.420781]  netlink_sendmsg+0x24b/0x4c0
  [308188.421023]  __sock_sendmsg+0x66/0x70
  [308188.421262]  __sys_sendto+0x113/0x190
  [308188.421511]  ? __audit_syscall_exit+0x269/0x2d0
  [308188.421754]  ? __audit_syscall_entry+0xde/0x120
  [308188.422024]  __x64_sys_sendto+0x24/0x30
  [308188.422258]  x64_sys_call+0x1bcb/0x1fa0
  [308188.422485]  do_syscall_64+0x56/0xb0
  [308188.422711]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
  [308188.422946] RIP: 0033:0x48fbae
  [308188.423196] RSP: 002b:000000c0038b1480 EFLAGS: 00000216 ORIG_RAX: 
000000000000002c
  [308188.423454] RAX: ffffffffffffffda RBX: 000000000000008a RCX: 
000000000048fbae
  [308188.423694] RDX: 0000000000000011 RSI: 000000c0074b84e0 RDI: 
000000000000008a
  [308188.423935] RBP: 000000c0038b14c0 R08: 000000c0074b84d4 R09: 
000000000000000c
  [308188.424180] R10: 0000000000000000 R11: 0000000000000216 R12: 
000000c0074b84e0
  [308188.424422] R13: 0000000000000000 R14: 000000c0039a2540 R15: 
00000000000007c6
  [308188.424667]  </TASK>
  
  The issue currently affects Jammy 5.15 kernel and is not present in the
  6.8 kernel or above.
  
  [Test Plan]
  
  Reproducing the issue requires a machine that is able to run a high
  number of applications concurrently that are hammering kernfs with I/O
  requests.
  
  commit "9caf69614225 kernfs: Introduce separate rwsem to protect inode
  attributes" contains a C program snippet reproducer.
  
  Although in this loop the sysfs being accessed is for an infiniband device, 
having a any similar program that reads files from sysfs with multiple 
instances of the program running, will trigger the issue.
  An example could be reading data from 
"/sys/class/net/<interface>/statistics/*" instead.
  
  In other words running multiple instances of a program that does IO
  operations on sysfs would confirm if the issue remains or is resolved.
  To reliably reproduce or confirm the resolution of the issue, dozens if
  not hundreds of instances should be executed.
  
  [ Where problems could occur ]
  
  * Since the fix changes the semaphore being used to access kernfs files from 
one that protects an entire kernfs filesystem to one that protects individual 
inodes, there is a chance that this might cause race conditions on read/write.
  * This might also cause deadlocks similar to the ones this bug is reporting.


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2125142

Title:
  Hung task when heavily accessing kernfs files

Status in linux package in Ubuntu:
  New
Status in linux source package in Jammy:
  New

Bug description:
  [impact]

  When running heavy IO operations on sysfs, the kernel will lock up and many 
hung tasks will be printed in dmesg.
  This issue is due to a bug in the kernfs driver and was fixed upstream 
through 4 commits:

      06fb4736139f kernfs: change kernfs_rename_lock into a read-write lock.
      c9f2dfb7b59e kernfs: Use a per-fs rwsem to protect per-fs list of 
kernfs_super_info.
      9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes.
      393c3714081a kernfs: switch global kernfs_rwsem lock to per-fs lock

  I have backported these commits and also backported two other required
  commits that these commit depend on:

      44a41882575b kernfs: dont take i_lock on inode attr read.
      f44a3ca1e533 kernfs: move struct kernfs_root out of the public view.

  5 other commits are required as well that contain fixes for the above
  mentioned commits:

      0559f63057f9 kernfs: fix missing kernfs_iattr_rwsem locking
      72b5d5aef246 kernfs: fix potential NULL dereference in __kernfs_remove
      ad8d869343ae kernfs: fix NULL dereferencing in kernfs_remove
      f3a690227f07 kernfs: remove redundant kernfs_rwsem declaration.
      555a0ce4558d kernfs: prevent early freeing of root node

  In total 11 commits are required.

  One of the main signs of hitting this bug, is seeing a huge number of
  tasks in the Uninterruptible state when analyzing the core dump:

  crash> ps -S
    RU: 99
    UN: 7725
    IN: 6113
    ID: 444

  Stack traces can be very different based on the application locking up
  but what they all have in common is that they are waiting for a mutex
  to unlock:

  [308188.415026] INFO: task READACTED:15703 blocked for more than 360 seconds.
  [308188.415362]       Not tainted 5.15.0-153-generic #163-Ubuntu
  [308188.415749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [308188.416054] task:REDACTED     state:D stack:    0 pid:15703 ppid: 15160 
flags:0x00000000
  [308188.416347] Call Trace:
  [308188.416634]  <TASK>
  [308188.416917]  __schedule+0x24e/0x590
  [308188.417202]  schedule+0x69/0x110
  [308188.417506]  schedule_preempt_disabled+0xe/0x20
  [308188.417789]  __mutex_lock.constprop.0+0x267/0x490
  [308188.418052]  ? rtnl_getlink+0x420/0x420
  [308188.418311]  __mutex_lock_slowpath+0x13/0x20
  [308188.418575]  mutex_lock+0x38/0x50
  [308188.418818]  __netlink_dump_start+0xbf/0x2f0
  [308188.419061]  ? rtnl_getlink+0x420/0x420
  [308188.419321]  rtnetlink_rcv_msg+0x2af/0x400
  [308188.419564]  ? rtnl_getlink+0x420/0x420
  [308188.419807]  ? rtnl_calcit.isra.0+0x130/0x130
  [308188.420051]  netlink_rcv_skb+0x53/0x100
  [308188.420297]  rtnetlink_rcv+0x15/0x20
  [308188.420539]  netlink_unicast+0x220/0x340
  [308188.420781]  netlink_sendmsg+0x24b/0x4c0
  [308188.421023]  __sock_sendmsg+0x66/0x70
  [308188.421262]  __sys_sendto+0x113/0x190
  [308188.421511]  ? __audit_syscall_exit+0x269/0x2d0
  [308188.421754]  ? __audit_syscall_entry+0xde/0x120
  [308188.422024]  __x64_sys_sendto+0x24/0x30
  [308188.422258]  x64_sys_call+0x1bcb/0x1fa0
  [308188.422485]  do_syscall_64+0x56/0xb0
  [308188.422711]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
  [308188.422946] RIP: 0033:0x48fbae
  [308188.423196] RSP: 002b:000000c0038b1480 EFLAGS: 00000216 ORIG_RAX: 
000000000000002c
  [308188.423454] RAX: ffffffffffffffda RBX: 000000000000008a RCX: 
000000000048fbae
  [308188.423694] RDX: 0000000000000011 RSI: 000000c0074b84e0 RDI: 
000000000000008a
  [308188.423935] RBP: 000000c0038b14c0 R08: 000000c0074b84d4 R09: 
000000000000000c
  [308188.424180] R10: 0000000000000000 R11: 0000000000000216 R12: 
000000c0074b84e0
  [308188.424422] R13: 0000000000000000 R14: 000000c0039a2540 R15: 
00000000000007c6
  [308188.424667]  </TASK>

  The issue currently affects Jammy 5.15 kernel and is not present in
  the 6.8 kernel or above.

  [Test Plan]

  Reproducing the issue requires a machine that is able to run a high
  number of applications concurrently that are hammering kernfs with I/O
  requests.

  commit "9caf69614225 kernfs: Introduce separate rwsem to protect inode
  attributes" contains a C program snippet reproducer.

  Although in this loop the sysfs being accessed is for an infiniband device, 
having a any similar program that reads files from sysfs with multiple 
instances of the program running, will trigger the issue.
  An example could be reading data from 
"/sys/class/net/<interface>/statistics/*" instead.

  In other words running multiple instances of a program that does IO
  operations on sysfs would confirm if the issue remains or is resolved.
  To reliably reproduce or confirm the resolution of the issue, dozens
  if not hundreds of instances should be executed.

  [ Where problems could occur ]

  * Since the fix changes the semaphore being used to access kernfs files from 
one that protects an entire kernfs filesystem to one that protects individual 
inodes, there is a chance that this might cause race conditions on read/write.
  * This might also cause deadlocks similar to the ones this bug is reporting.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2125142/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2125142] Re: Hung task when heavily accessing kernfs files

Reply via email to