** Summary changed: - [Regression] kernel 5.15.0-144-generic - discard broken with RAID10 + raid10: block discard causes a NULL pointer dereference after 5.15.0-144-generic
** Description changed: - After upgrading to jammy kernel 5.15.0-144-generic we encountered a - serious regression when the weekly fstrim timer ran. + BugLink: https://bugs.launchpad.net/bugs/2117395 - This bug was introduced by commit "md/raid10: fix missing discard IO accounting" - https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4a05f7ae33716d996c5ce56478a36a3ede1d76f2 - which was backported to all stable kernels and became part of 5.15.181 + [Impact] - The issue was discovered earlier upstream[1] and also in Debian[2], - which resulted in a fix being added to the Debian kernel and - subsequently into 6.1. However the missing patch[3] did not make it into - the 5.15-stable kernel triggering the regression also in Ubuntu jammy. + The below commit was backported to 5.15.181 -stable, and introduced a NULL + pointer dereference in the raid10 subsystem, due to io_acct_set only being used + in raid 0 and 456, and not 1 or 10. + commit d05af90d6218e9c8f1c2026990c3f53c1b41bfb0 + Author: Yu Kuai <yuku...@huawei.com> + Date: Tue Mar 25 09:57:46 2025 +0800 + Subject: md/raid10: fix missing discard IO accounting + Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d05af90d6218e9c8f1c2026990c3f53c1b41bfb0 - [1] https://lists.linaro.org/archives/list/linux-stable-mir...@lists.linaro.org/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/ - [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460 - [3] https://lore.kernel.org/all/20230621165110.1498313-2-yuku...@huaweicloud.com/ - - - dmesg: + Kernel oops: kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: #PF: supervisor instruction fetch in kernel mode kernel: #PF: error_code(0x0010) - not-present page - kernel: PGD 0 P4D 0 + kernel: PGD 0 P4D 0 kernel: Oops: 0010 [#1] SMP PTI kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic #157-Ubuntu - kernel: Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.27.0.SR.1 for D3417-B2x 06/10/2020 kernel: RIP: 0010:0x0 kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206 kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001 kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800 kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050 kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00 kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400 - kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000 - kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000 + kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: - kernel: <TASK> - kernel: mempool_alloc+0x61/0x1b0 - kernel: ? __kmalloc+0x179/0x330 - kernel: bio_alloc_bioset+0x9d/0x370 - kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10] - kernel: bio_clone_fast+0x1f/0x90 - kernel: md_account_bio+0x42/0x80 - kernel: raid10_handle_discard+0x56f/0x6b0 [raid10] - kernel: raid10_make_request+0x147/0x180 [raid10] - kernel: md_handle_request+0x12a/0x1b0 - kernel: ? submit_bio_checks+0x1a5/0x580 - kernel: md_submit_bio+0x76/0xc0 - kernel: __submit_bio+0x1a2/0x220 - kernel: ? mempool_alloc_slab+0x17/0x20 - kernel: ? mempool_alloc+0x61/0x1b0 - kernel: ? schedule_timeout+0x91/0x140 - kernel: __submit_bio_noacct+0x85/0x200 - kernel: submit_bio_noacct+0x4e/0x120 - kernel: ? __cond_resched+0x1a/0x60 - kernel: submit_bio+0x4a/0x130 - kernel: submit_bio_wait+0x5a/0xc0 - kernel: blkdev_issue_discard+0x7e/0xd0 - kernel: ext4_try_to_trim_range+0x2db/0x520 - kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0 - kernel: ext4_trim_fs+0x313/0x510 - kernel: __ext4_ioctl+0x82c/0xef0 - kernel: ext4_ioctl+0xe/0x20 - kernel: __x64_sys_ioctl+0x92/0xd0 - kernel: x64_sys_call+0x1e5f/0x1fa0 - kernel: do_syscall_64+0x56/0xb0 - kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6 - kernel: RIP: 0033:0x7f6fffc0994f - kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 > - kernel: RSP: 002b:00007ffdce979c30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 - kernel: RAX: ffffffffffffffda RBX: 00007ffdce979d80 RCX: 00007f6fffc0994f - kernel: RDX: 00007ffdce979ca0 RSI: 00000000c0185879 RDI: 0000000000000003 - kernel: RBP: 0000558436acccb0 R08: 0000558436acccb0 R09: 0000000000000000 - kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 - kernel: R13: 0000558436accfa0 R14: 0000558436acce80 R15: 0000558436acce80 - kernel: </TASK> - kernel: Modules linked in: tls tcp_diag udp_diag inet_diag bridge stp llc nft_counter nft_chain_nat nf_nat > - kernel: xhci_pci_renesas wmi video - kernel: CR2: 0000000000000000 - kernel: ---[ end trace db9334d27f904581 ]--- - kernel: RIP: 0010:0x0 - kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. - kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206 - kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001 - kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800 - kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050 - kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00 - kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400 - kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000 - kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 - kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0 - kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 - kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 - kernel: BUG: unable to handle page fault for address: ffffb57600000010 + kernel: <TASK> + kernel: mempool_alloc+0x61/0x1b0 + kernel: ? __kmalloc+0x179/0x330 + kernel: bio_alloc_bioset+0x9d/0x370 + kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10] + kernel: bio_clone_fast+0x1f/0x90 + kernel: md_account_bio+0x42/0x80 + kernel: raid10_handle_discard+0x56f/0x6b0 [raid10] + kernel: raid10_make_request+0x147/0x180 [raid10] + kernel: md_handle_request+0x12a/0x1b0 + kernel: ? submit_bio_checks+0x1a5/0x580 + kernel: md_submit_bio+0x76/0xc0 + kernel: __submit_bio+0x1a2/0x220 + kernel: ? mempool_alloc_slab+0x17/0x20 + kernel: ? mempool_alloc+0x61/0x1b0 + kernel: ? schedule_timeout+0x91/0x140 + kernel: __submit_bio_noacct+0x85/0x200 + kernel: submit_bio_noacct+0x4e/0x120 + kernel: ? __cond_resched+0x1a/0x60 + kernel: submit_bio+0x4a/0x130 + kernel: submit_bio_wait+0x5a/0xc0 + kernel: blkdev_issue_discard+0x7e/0xd0 + kernel: ext4_try_to_trim_range+0x2db/0x520 + kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0 + kernel: ext4_trim_fs+0x313/0x510 + kernel: __ext4_ioctl+0x82c/0xef0 + kernel: ext4_ioctl+0xe/0x20 + kernel: __x64_sys_ioctl+0x92/0xd0 + kernel: x64_sys_call+0x1e5f/0x1fa0 + kernel: do_syscall_64+0x56/0xb0 + kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6 + + A workaround is to disable the systemd weekly fstrim timer and to not fstrim / + discard blocks while the problem exists. + + [Fix] + + The below necessary commit was mainlined in 6.6-rc1 and needs to be backported + to jammy. + + commit c567c86b90d4715081adfe5eb812141a5b6b4883 + Author: Yu Kuai <yuku...@huawei.com> + Date: Thu Jun 22 00:51:03 2023 +0800 + Subject: md: move initialization and destruction of 'io_acct_set' to md.c + Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c567c86b90d4715081adfe5eb812141a5b6b4883 + + This needs a minor backport, adjusting __md_stop() to md_stop(). + + [Testcase] + + You will need a machine with at least 4x NVMe drives which support block + discard. I use a i3.8xlarge instance on AWS, since it has all of these things. + + $ lsblk + xvda 202:0 0 8G 0 disk + └─xvda1 202:1 0 8G 0 part / + nvme0n1 259:2 0 1.7T 0 disk + nvme1n1 259:0 0 1.7T 0 disk + nvme2n1 259:1 0 1.7T 0 disk + nvme3n1 259:3 0 1.7T 0 disk + + Create a Raid10 array: + + $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 + /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 + + Format the array with XFS: + + $ sudo mkfs.xfs /dev/md0 + + $ sudo mkdir /mnt/disk + $ sudo mount /dev/md0 /mnt/disk + + Do a fstrim: + + $ sudo fstrim /mnt/disk + + There are test packages available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf414897-test + + If you install the test kernel, the kernel will no longer panic on + fstrim. + + [Where problems can occur] + + This changes io_acct_set from being sometimes initialised, mostly under raid 0, + 456 to being always initialised under all raid types. + + If a regression were to occur, it would likely impact block discard on any raid + type, not just raid 10, but raid 10 would carry more risk as we may be missing + more patches due to discard on raid10 being very new, as in the last 5 or so + years, versus 0, 456 which have had full discard for a decade or more. + + The workarounds would be the same, to disable the systemd block discard timer + or disable fstrim. + + [Other info] + + Upstream bug: + https://lists.linaro.org/archives/list/linux-stable-mir...@lists.linaro.org/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/ + + Debian bug: + https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2117395 Title: raid10: block discard causes a NULL pointer dereference after 5.15.0-144-generic To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2117395/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs