** Summary changed:

- [Regression] kernel 5.15.0-144-generic -  discard broken with RAID10
+ raid10: block discard causes a NULL pointer dereference after 
5.15.0-144-generic

** Description changed:

- After upgrading to jammy kernel 5.15.0-144-generic we encountered a
- serious regression when the weekly fstrim timer ran.
+ BugLink: https://bugs.launchpad.net/bugs/2117395
  
- This bug was introduced by commit "md/raid10: fix missing discard IO 
accounting"
- 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4a05f7ae33716d996c5ce56478a36a3ede1d76f2
- which was backported to all stable kernels and became part of 5.15.181
+ [Impact]
  
- The issue was discovered earlier upstream[1] and also in Debian[2],
- which resulted in a fix being added to the Debian kernel and
- subsequently into 6.1. However the missing patch[3] did not make it into
- the 5.15-stable kernel triggering the regression also in Ubuntu jammy.
+ The below commit was backported to 5.15.181 -stable, and introduced a NULL
+ pointer dereference in the raid10 subsystem, due to io_acct_set only being 
used
+ in raid 0 and 456, and not 1 or 10.
  
+ commit d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
+ Author: Yu Kuai <yuku...@huawei.com>
+ Date:   Tue Mar 25 09:57:46 2025 +0800
+ Subject: md/raid10: fix missing discard IO accounting
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
  
- [1] 
https://lists.linaro.org/archives/list/linux-stable-mir...@lists.linaro.org/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/
- [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460
- [3] 
https://lore.kernel.org/all/20230621165110.1498313-2-yuku...@huaweicloud.com/
- 
- 
- dmesg:
+ Kernel oops:
  
  kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
  kernel: #PF: supervisor instruction fetch in kernel mode
  kernel: #PF: error_code(0x0010) - not-present page
- kernel: PGD 0 P4D 0 
+ kernel: PGD 0 P4D 0
  kernel: Oops: 0010 [#1] SMP PTI
  kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic 
#157-Ubuntu
- kernel: Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.27.0.SR.1 for 
D3417-B2x               06/10/2020
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
  kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
  kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
  kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
  kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
  kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
  kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
- kernel: FS:  00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
- kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
+ kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
  kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  kernel: Call Trace:
- kernel:  <TASK>
- kernel:  mempool_alloc+0x61/0x1b0
- kernel:  ? __kmalloc+0x179/0x330
- kernel:  bio_alloc_bioset+0x9d/0x370
- kernel:  ? r10bio_pool_alloc+0x26/0x30 [raid10]
- kernel:  bio_clone_fast+0x1f/0x90
- kernel:  md_account_bio+0x42/0x80
- kernel:  raid10_handle_discard+0x56f/0x6b0 [raid10]
- kernel:  raid10_make_request+0x147/0x180 [raid10]
- kernel:  md_handle_request+0x12a/0x1b0
- kernel:  ? submit_bio_checks+0x1a5/0x580
- kernel:  md_submit_bio+0x76/0xc0
- kernel:  __submit_bio+0x1a2/0x220
- kernel:  ? mempool_alloc_slab+0x17/0x20
- kernel:  ? mempool_alloc+0x61/0x1b0
- kernel:  ? schedule_timeout+0x91/0x140
- kernel:  __submit_bio_noacct+0x85/0x200
- kernel:  submit_bio_noacct+0x4e/0x120
- kernel:  ? __cond_resched+0x1a/0x60
- kernel:  submit_bio+0x4a/0x130
- kernel:  submit_bio_wait+0x5a/0xc0
- kernel:  blkdev_issue_discard+0x7e/0xd0
- kernel:  ext4_try_to_trim_range+0x2db/0x520
- kernel:  ? ext4_mb_load_buddy_gfp+0x91/0x3e0
- kernel:  ext4_trim_fs+0x313/0x510
- kernel:  __ext4_ioctl+0x82c/0xef0
- kernel:  ext4_ioctl+0xe/0x20
- kernel:  __x64_sys_ioctl+0x92/0xd0
- kernel:  x64_sys_call+0x1e5f/0x1fa0
- kernel:  do_syscall_64+0x56/0xb0
- kernel:  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
- kernel: RIP: 0033:0x7f6fffc0994f
- kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 
89 44 24 08 48 8d 44 24 20 48 >
- kernel: RSP: 002b:00007ffdce979c30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
- kernel: RAX: ffffffffffffffda RBX: 00007ffdce979d80 RCX: 00007f6fffc0994f
- kernel: RDX: 00007ffdce979ca0 RSI: 00000000c0185879 RDI: 0000000000000003
- kernel: RBP: 0000558436acccb0 R08: 0000558436acccb0 R09: 0000000000000000
- kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
- kernel: R13: 0000558436accfa0 R14: 0000558436acce80 R15: 0000558436acce80
- kernel:  </TASK>
- kernel: Modules linked in: tls tcp_diag udp_diag inet_diag bridge stp llc 
nft_counter nft_chain_nat nf_nat >
- kernel:  xhci_pci_renesas wmi video
- kernel: CR2: 0000000000000000
- kernel: ---[ end trace db9334d27f904581 ]---
- kernel: RIP: 0010:0x0
- kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
- kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
- kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
- kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
- kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
- kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
- kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
- kernel: FS:  00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
- kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
- kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
- kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
- kernel: BUG: unable to handle page fault for address: ffffb57600000010
+ kernel: <TASK>
+ kernel: mempool_alloc+0x61/0x1b0
+ kernel: ? __kmalloc+0x179/0x330
+ kernel: bio_alloc_bioset+0x9d/0x370
+ kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10]
+ kernel: bio_clone_fast+0x1f/0x90
+ kernel: md_account_bio+0x42/0x80
+ kernel: raid10_handle_discard+0x56f/0x6b0 [raid10]
+ kernel: raid10_make_request+0x147/0x180 [raid10]
+ kernel: md_handle_request+0x12a/0x1b0
+ kernel: ? submit_bio_checks+0x1a5/0x580
+ kernel: md_submit_bio+0x76/0xc0
+ kernel: __submit_bio+0x1a2/0x220
+ kernel: ? mempool_alloc_slab+0x17/0x20
+ kernel: ? mempool_alloc+0x61/0x1b0
+ kernel: ? schedule_timeout+0x91/0x140
+ kernel: __submit_bio_noacct+0x85/0x200
+ kernel: submit_bio_noacct+0x4e/0x120
+ kernel: ? __cond_resched+0x1a/0x60
+ kernel: submit_bio+0x4a/0x130
+ kernel: submit_bio_wait+0x5a/0xc0
+ kernel: blkdev_issue_discard+0x7e/0xd0
+ kernel: ext4_try_to_trim_range+0x2db/0x520
+ kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0
+ kernel: ext4_trim_fs+0x313/0x510
+ kernel: __ext4_ioctl+0x82c/0xef0
+ kernel: ext4_ioctl+0xe/0x20
+ kernel: __x64_sys_ioctl+0x92/0xd0
+ kernel: x64_sys_call+0x1e5f/0x1fa0
+ kernel: do_syscall_64+0x56/0xb0
+ kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6
+ 
+ A workaround is to disable the systemd weekly fstrim timer and to not fstrim /
+ discard blocks while the problem exists.
+ 
+ [Fix]
+ 
+ The below necessary commit was mainlined in 6.6-rc1 and needs to be backported
+ to jammy.
+ 
+ commit c567c86b90d4715081adfe5eb812141a5b6b4883
+ Author: Yu Kuai <yuku...@huawei.com>
+ Date:   Thu Jun 22 00:51:03 2023 +0800
+ Subject: md: move initialization and destruction of 'io_acct_set' to md.c
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c567c86b90d4715081adfe5eb812141a5b6b4883
+ 
+ This needs a minor backport, adjusting __md_stop() to md_stop().
+ 
+ [Testcase]
+ 
+ You will need a machine with at least 4x NVMe drives which support block 
+ discard. I use a i3.8xlarge instance on AWS, since it has all of these things.
+ 
+ $ lsblk
+ xvda 202:0 0 8G 0 disk
+ └─xvda1 202:1 0 8G 0 part /
+ nvme0n1 259:2 0 1.7T 0 disk
+ nvme1n1 259:0 0 1.7T 0 disk
+ nvme2n1 259:1 0 1.7T 0 disk
+ nvme3n1 259:3 0 1.7T 0 disk
+ 
+ Create a Raid10 array:
+ 
+ $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4
+ /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
+ 
+ Format the array with XFS:
+ 
+ $ sudo mkfs.xfs /dev/md0
+ 
+ $ sudo mkdir /mnt/disk
+ $ sudo mount /dev/md0 /mnt/disk
+ 
+ Do a fstrim:
+ 
+ $ sudo fstrim /mnt/disk
+ 
+ There are test packages available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf414897-test
+ 
+ If you install the test kernel, the kernel will no longer panic on
+ fstrim.
+ 
+ [Where problems can occur]
+ 
+ This changes io_acct_set from being sometimes initialised, mostly under raid 
0,
+ 456 to being always initialised under all raid types.
+ 
+ If a regression were to occur, it would likely impact block discard on any 
raid
+ type, not just raid 10, but raid 10 would carry more risk as we may be missing
+ more patches due to discard on raid10 being very new, as in the last 5 or so
+ years, versus 0, 456 which have had full discard for a decade or more.
+ 
+ The workarounds would be the same, to disable the systemd block discard timer
+ or disable fstrim.
+ 
+ [Other info]
+ 
+ Upstream bug:
+ 
https://lists.linaro.org/archives/list/linux-stable-mir...@lists.linaro.org/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/
+ 
+ Debian bug:
+ https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2117395

Title:
  raid10: block discard causes a NULL pointer dereference after
  5.15.0-144-generic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2117395/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to