** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/2117395
  
  [Impact]
  
  The below commit was backported to 5.15.181 -stable, and introduced a NULL
  pointer dereference in the raid10 subsystem, due to io_acct_set only being 
used
  in raid 0 and 456, and not 1 or 10.
  
  commit d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
  Author: Yu Kuai <yuku...@huawei.com>
  Date:   Tue Mar 25 09:57:46 2025 +0800
  Subject: md/raid10: fix missing discard IO accounting
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
  
  Kernel oops:
  
  kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
  kernel: #PF: supervisor instruction fetch in kernel mode
  kernel: #PF: error_code(0x0010) - not-present page
  kernel: PGD 0 P4D 0
  kernel: Oops: 0010 [#1] SMP PTI
  kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic 
#157-Ubuntu
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
  kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
  kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
  kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
  kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
  kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
  kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
  kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
  kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
  kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  kernel: Call Trace:
  kernel: <TASK>
  kernel: mempool_alloc+0x61/0x1b0
  kernel: ? __kmalloc+0x179/0x330
  kernel: bio_alloc_bioset+0x9d/0x370
  kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10]
  kernel: bio_clone_fast+0x1f/0x90
  kernel: md_account_bio+0x42/0x80
  kernel: raid10_handle_discard+0x56f/0x6b0 [raid10]
  kernel: raid10_make_request+0x147/0x180 [raid10]
  kernel: md_handle_request+0x12a/0x1b0
  kernel: ? submit_bio_checks+0x1a5/0x580
  kernel: md_submit_bio+0x76/0xc0
  kernel: __submit_bio+0x1a2/0x220
  kernel: ? mempool_alloc_slab+0x17/0x20
  kernel: ? mempool_alloc+0x61/0x1b0
  kernel: ? schedule_timeout+0x91/0x140
  kernel: __submit_bio_noacct+0x85/0x200
  kernel: submit_bio_noacct+0x4e/0x120
  kernel: ? __cond_resched+0x1a/0x60
  kernel: submit_bio+0x4a/0x130
  kernel: submit_bio_wait+0x5a/0xc0
  kernel: blkdev_issue_discard+0x7e/0xd0
  kernel: ext4_try_to_trim_range+0x2db/0x520
  kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0
  kernel: ext4_trim_fs+0x313/0x510
  kernel: __ext4_ioctl+0x82c/0xef0
  kernel: ext4_ioctl+0xe/0x20
  kernel: __x64_sys_ioctl+0x92/0xd0
  kernel: x64_sys_call+0x1e5f/0x1fa0
  kernel: do_syscall_64+0x56/0xb0
  kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6
  
  A workaround is to disable the systemd weekly fstrim timer and to not fstrim /
  discard blocks while the problem exists.
  
  [Fix]
  
  The below necessary commit was mainlined in 6.6-rc1 and needs to be backported
  to jammy.
  
  commit c567c86b90d4715081adfe5eb812141a5b6b4883
  Author: Yu Kuai <yuku...@huawei.com>
  Date:   Thu Jun 22 00:51:03 2023 +0800
  Subject: md: move initialization and destruction of 'io_acct_set' to md.c
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c567c86b90d4715081adfe5eb812141a5b6b4883
  
  This needs a minor backport, adjusting __md_stop() to md_stop().
  
  [Testcase]
  
- You will need a machine with at least 4x NVMe drives which support block 
+ You will need a machine with at least 4x NVMe drives which support block
  discard. I use a i3.8xlarge instance on AWS, since it has all of these things.
  
  $ lsblk
  xvda 202:0 0 8G 0 disk
  └─xvda1 202:1 0 8G 0 part /
  nvme0n1 259:2 0 1.7T 0 disk
  nvme1n1 259:0 0 1.7T 0 disk
  nvme2n1 259:1 0 1.7T 0 disk
  nvme3n1 259:3 0 1.7T 0 disk
  
  Create a Raid10 array:
  
  $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4
  /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
  
- Format the array with XFS:
+ Format the array with XFS (use -K to disable initial discard):
  
- $ sudo mkfs.xfs /dev/md0
+ $ sudo mkfs.xfs -K /dev/md0
  
  $ sudo mkdir /mnt/disk
  $ sudo mount /dev/md0 /mnt/disk
  
  Do a fstrim:
  
  $ sudo fstrim /mnt/disk
  
  There are test packages available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf414897-test
  
  If you install the test kernel, the kernel will no longer panic on
  fstrim.
  
  [Where problems can occur]
  
  This changes io_acct_set from being sometimes initialised, mostly under raid 
0,
  456 to being always initialised under all raid types.
  
  If a regression were to occur, it would likely impact block discard on any 
raid
  type, not just raid 10, but raid 10 would carry more risk as we may be missing
  more patches due to discard on raid10 being very new, as in the last 5 or so
  years, versus 0, 456 which have had full discard for a decade or more.
  
  The workarounds would be the same, to disable the systemd block discard timer
  or disable fstrim.
  
  [Other info]
  
  Upstream bug:
  
https://lists.linaro.org/archives/list/linux-stable-mir...@lists.linaro.org/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/
  
  Debian bug:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2117395

Title:
  raid10: block discard causes a NULL pointer dereference after
  5.15.0-144-generic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2117395/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to