I think I see the problem and AFAICT commit 0a46ef234756dca is just making the latent deadlock easier to hit. The problem is shown by these two stacktraces from your dmesg:
Task 1 [ 247.045575] __wait_on_freeing_inode+0xba/0x140 [ 247.045584] find_inode_fast+0xa4/0xe0 [ 247.045588] iget_locked+0x71/0x200 [ 247.045597] __ext4_iget+0x148/0x1080 [ 247.045615] ext4_xattr_inode_cache_find+0xe2/0x220 [ 247.045621] ext4_xattr_inode_lookup_create+0x122/0x240 [ 247.045626] ext4_xattr_block_set+0xc2/0xeb0 [ 247.045633] ext4_xattr_set_handle+0x4ba/0x650 [ 247.045641] ext4_xattr_set+0x80/0x160 Task 2 [ 247.043719] mb_cache_entry_wait_unused+0x9a/0xd0 [ 247.043729] ext4_evict_ea_inode+0x64/0xb0 [ 247.043733] ext4_evict_inode+0x35c/0x6d0 [ 247.043739] evict+0x108/0x2c0 [ 247.043745] iput+0x14a/0x260 [ 247.043749] ext4_xattr_ibody_set+0x175/0x1d0 [ 247.043754] ext4_xattr_set_handle+0x297/0x650 [ 247.043762] ext4_xattr_set+0x80/0x160 These two tasks are deadlocked against each other. One has dropped the last reference to xattr inode and is trying to remove it from memory and waits for corresponding mbcache entry to get unused while another task is holding the mbcache entry reference and is waiting for inode to be evicted from memory. Commit 0a46ef234756dca removed synchronization on buffer lock for one of the hot paths and thus hitting this race is now much more likely. I just have to make up my mind how to best fix this ABBA deadlock. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2080853 Title: oracular 6.11 kernel regression with ext4 and ea_inode mount flags and exercising xattrs Status in Linux: Confirmed Status in linux package in Ubuntu: New Status in linux source package in Oracular: New Bug description: How to reproduce this issue: Kernel: 6.11.0-7, AMD64 virtual machine, oracular, updated 16th Sept 2024 @ 14:15 UK TZ 8 thread virtual machine (important, must be multiple CPU threads to trigger the regression) 20GB virtio drive on /dev/vdb, 1 partition /dev/vdb1 sudo mkfs.ext4 /dev/vdb1 -O ea_inode sudo mount /dev/vdb1 /mnt git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ..wait a couple of minutes, you will see that the number of running processes is not 8 as expected (from the --vmstat output of stress-ng) cannot stop stress-ng because of a kernel lockup; so use another tty and check dmesg, I get the following: [ 247.028846] INFO: task jbd2/vdb1-8:1548 blocked for more than 122 seconds. [ 247.030830] Not tainted 6.11.0-7-generic #7-Ubuntu [ 247.032667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.034170] task:jbd2/vdb1-8 state:D stack:0 pid:1548 tgid:1548 ppid:2 flags:0x00004000 [ 247.034176] Call Trace: [ 247.034178] <TASK> [ 247.034182] __schedule+0x277/0x6c0 [ 247.034199] schedule+0x29/0xd0 [ 247.034203] jbd2_journal_wait_updates+0x77/0xf0 [ 247.034207] ? __pfx_autoremove_wake_function+0x10/0x10 [ 247.034213] jbd2_journal_commit_transaction+0x290/0x1a10 [ 247.034223] kjournald2+0xa8/0x250 [ 247.034228] ? __pfx_autoremove_wake_function+0x10/0x10 [ 247.034233] ? __pfx_kjournald2+0x10/0x10 [ 247.034236] kthread+0xe1/0x110 [ 247.034241] ? __pfx_kthread+0x10/0x10 [ 247.034244] ret_from_fork+0x44/0x70 [ 247.034247] ? __pfx_kthread+0x10/0x10 [ 247.034251] ret_from_fork_asm+0x1a/0x30 [ 247.034257] </TASK> NOTE: this works fine for Limux 6.8.0-31, so this looks like a regression for 6.11.0-7 Attached is the full kernel log. To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/2080853/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp