[Bug 2090972] Re: /boot intermittently fails to mount on boot

Matthew Ruffell Wed, 04 Dec 2024 16:45:38 -0800

** Description changed:

+ [Impact]
+ 
  Starting on Noble, we see /boot fail to mount in approximately one out
  of every two thousand boots.  The error looks like this:
  
-    Found device dev-disk-by\x2dlabel-BOOT.device - QEMU NVMe Ctrl BOOT.
-    Starting systemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT...
-    Checking in progress on 1 disk (0.0% complete)
-    Checking in progress on 0 disks (100.0% complete)
-    Finished msystemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT.
-    Mounting boot.mount - /boot...
-    [    3.051612] /dev/disk/by-label/BOOT: Can't lookup blockdev
-    FAILED Failed to mount boot.mount - /boot.
-    See 'systemctl status boot.mount' for details.
+    Found device dev-disk-by\x2dlabel-BOOT.device - QEMU NVMe Ctrl BOOT.
+    Starting systemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT...
+    Checking in progress on 1 disk (0.0% complete)
+    Checking in progress on 0 disks (100.0% complete)
+    Finished msystemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT.
+    Mounting boot.mount - /boot...
+    [    3.051612] /dev/disk/by-label/BOOT: Can't lookup blockdev
+    FAILED Failed to mount boot.mount - /boot.
+    See 'systemctl status boot.mount' for details.
  
  This has resulted in a number of different failure modes for our users.
  
  Anything that needs to interact with /boot during provisioning will
  fail.  This is usually something running update-grub or similar.
  
  If we manage to succeed in booting, this can cause subsequent kernel
  updates or tools that install kernel modules to fail, because
  update-grub fails.
  
  We've _also_ seen this manifest on the root filesystem.  In that case,
  the boot succeeded, but the by-label links remain absent.  When this
  occurs, we find installing kernel packages fails because mkinitramfs
  can't locate the root disk by label.
+ 
+ [ Testcase ]
  
  It's a vexing problem, and so to reproduce we ran cloud images in a boot
  loop until we could reliably reproduce the problem.  Unfortunately, we
  weren't able to work out anything that made this happen faster, so it's
  been a bit slow coming.
  
  It turns out the problem here is that libblkid recently added support to
  compute the checksum of the superblocks on ext4 filesystems, and Noble
  is the first release to include a version of util-linux new enough to
  have this feature.  When libblkid determines an ext4 superblock's
  checksum is corrupt, it refuses to identify the device as having a
  filesystem, which leads to the removeal of the uuid and by-label fields.
  systemd-udevd then removes these symlinks.  This is where it all goes
  wrong.
  
  From our debug traces, it's possible to see this clearly:
  
  (udev-worker)[208]: nvme0n1p16: Probe /dev/nvme0n1p16 with raid and
  offset=0
  
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [36] ext4dev:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4dev, got 
D919EB5600000000, expected A47F6CF000000000
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [37] ext4:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4, got D919EB5600000000, 
expected A47F6CF000000000
  
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', which is no longer 
belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', removing
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-label/BOOT', which is no longer belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-label/BOOT', removing
  (udev-worker)[208]: nvme0n1p16: Successfully created symlink 
'/dev/block/259:4' to '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: sd-device: Created db file 
'/run/udev/data/b259:4' for 
'/devices/pci0000:00/0000:00:01.0/nvme/nvme0/nvme0n1/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Adding watch on '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Device processed (SEQNUM=1630, ACTION=change)
  (udev-worker)[208]: nvme0n1p16: sd-device-monitor(worker): Passed 1315 byte 
to netlink monitor.
  
+ We've also been running with a version of this patch backported to
+ 2.39.3-9ubuntu6.1 in our production environment for a couple of months.
+ It's completely eliminated this problem for us.  We were observing this
+ about once an hour and have had no recurrence since rolling out the fix.
+ 
+ There are test packages available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/lp2090972-updates
+ 
+ [Where problems could occur]
+ 
+ We are changing how superblocks are read off of filesystems at initial device
+ probe time. Luckily, the initial read of the superblock is unchanged from what
+ happens now. Only if that read fails to compute the correct checksum, then we
+ read the superblock from the underlying disk with O_DIRECT, and hopefully
+ compute the correct checksum. If that yeilds an mismatch, then we raise the 
+ incorrect checksum error as usual.
+ 
+ Since this just adds a quick re-read and recompute of the checksum, if a
+ race between memory and the underlying disk would occur, the recheck
+ would take minimal amounts of time and would not be noticeable during
+ instance boot time.
+ 
+ If a regression were to occur, it would affect checksum computation, which 
could
+ cause the disk to be declared corrupted and not containing a valid filesystem,
+ which would cause boot to fail, and would have a large impact to users.
+ 
+ [Other Info]
  
  Fortunately, the fix here is straight-forward and is similar to what we
  did for resize2fs: use O_DIRECT when reading the superblock.  We've
  already sent a patch upstream and gotten it accepted there:
  
- https://github.com/util-linux/util-
- linux/commit/483c9f38e377ff0b009f546a2c4ee91a1d61588c
- 
- We've also been running with a version of this patch backported to
- 2.39.3-9ubuntu6.1 in our production environment for a couple of months.
- It's completely eliminated this problem for us.  We were observing this
- about once an hour and have had no recurrence since rolling out the fix.
+ commit 483c9f38e377ff0b009f546a2c4ee91a1d61588c
+ From: Krister Johansen <k...@templeofstupid.com>
+ Date: Mon, 18 Nov 2024 12:35:22 -0800
+ Subject: libblkid: fix spurious ext superblock checksum mismatches
+ Link: 
https://github.com/util-linux/util-linux/commit/483c9f38e377ff0b009f546a2c4ee91a1d61588c
  
  We had a discussion with Ted Ts'o about this as well, and he had some
  ideas for future improvements, but nothing that we're implementing in
  this fix:
  
  https://lore.kernel.org/util-
  
linux/6d16e6d83ab48d2ea4402db17c9c0ed5514933a7.1731961869.git.k...@templeofstupid.com/T/#m55eb5087639dcfcfd5708144b1b48caf0cf762b8
- 
- I'm attaching the patch that we've applied to our tree and would be
- grateful if you'd pull this into util-linux >= 2.39.


** Description changed:

  [Impact]
  
  Starting on Noble, we see /boot fail to mount in approximately one out
  of every two thousand boots.  The error looks like this:
  
     Found device dev-disk-by\x2dlabel-BOOT.device - QEMU NVMe Ctrl BOOT.
     Starting systemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT...
     Checking in progress on 1 disk (0.0% complete)
     Checking in progress on 0 disks (100.0% complete)
     Finished msystemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT.
     Mounting boot.mount - /boot...
     [    3.051612] /dev/disk/by-label/BOOT: Can't lookup blockdev
     FAILED Failed to mount boot.mount - /boot.
     See 'systemctl status boot.mount' for details.
  
  This has resulted in a number of different failure modes for our users.
  
  Anything that needs to interact with /boot during provisioning will
  fail.  This is usually something running update-grub or similar.
  
  If we manage to succeed in booting, this can cause subsequent kernel
  updates or tools that install kernel modules to fail, because
  update-grub fails.
  
  We've _also_ seen this manifest on the root filesystem.  In that case,
  the boot succeeded, but the by-label links remain absent.  When this
  occurs, we find installing kernel packages fails because mkinitramfs
  can't locate the root disk by label.
  
  [ Testcase ]
  
  It's a vexing problem, and so to reproduce we ran cloud images in a boot
  loop until we could reliably reproduce the problem.  Unfortunately, we
  weren't able to work out anything that made this happen faster, so it's
  been a bit slow coming.
  
  It turns out the problem here is that libblkid recently added support to
  compute the checksum of the superblocks on ext4 filesystems, and Noble
  is the first release to include a version of util-linux new enough to
  have this feature.  When libblkid determines an ext4 superblock's
  checksum is corrupt, it refuses to identify the device as having a
  filesystem, which leads to the removeal of the uuid and by-label fields.
  systemd-udevd then removes these symlinks.  This is where it all goes
  wrong.
  
  From our debug traces, it's possible to see this clearly:
  
  (udev-worker)[208]: nvme0n1p16: Probe /dev/nvme0n1p16 with raid and
  offset=0
  
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [36] ext4dev:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4dev, got 
D919EB5600000000, expected A47F6CF000000000
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [37] ext4:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4, got D919EB5600000000, 
expected A47F6CF000000000
  
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', which is no longer 
belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', removing
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-label/BOOT', which is no longer belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-label/BOOT', removing
  (udev-worker)[208]: nvme0n1p16: Successfully created symlink 
'/dev/block/259:4' to '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: sd-device: Created db file 
'/run/udev/data/b259:4' for 
'/devices/pci0000:00/0000:00:01.0/nvme/nvme0/nvme0n1/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Adding watch on '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Device processed (SEQNUM=1630, ACTION=change)
  (udev-worker)[208]: nvme0n1p16: sd-device-monitor(worker): Passed 1315 byte 
to netlink monitor.
  
  We've also been running with a version of this patch backported to
  2.39.3-9ubuntu6.1 in our production environment for a couple of months.
  It's completely eliminated this problem for us.  We were observing this
  about once an hour and have had no recurrence since rolling out the fix.
  
  There are test packages available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/lp2090972-updates
  
  [Where problems could occur]
  
  We are changing how superblocks are read off of filesystems at initial device
  probe time. Luckily, the initial read of the superblock is unchanged from what
  happens now. Only if that read fails to compute the correct checksum, then we
  read the superblock from the underlying disk with O_DIRECT, and hopefully
- compute the correct checksum. If that yeilds an mismatch, then we raise the 
+ compute the correct checksum. If that yeilds an mismatch, then we raise the
  incorrect checksum error as usual.
  
- Since this just adds a quick re-read and recompute of the checksum, if a
- race between memory and the underlying disk would occur, the recheck
- would take minimal amounts of time and would not be noticeable during
- instance boot time.
+ Since this just adds a quick re-read and recompute of the checksum, if a race 
between memory and the underlying disk would occur, the recheck would take
+ minimal amounts of time and would not be noticeable during instance boot time.
  
  If a regression were to occur, it would affect checksum computation, which 
could
  cause the disk to be declared corrupted and not containing a valid filesystem,
  which would cause boot to fail, and would have a large impact to users.
  
  [Other Info]
  
  Fortunately, the fix here is straight-forward and is similar to what we
  did for resize2fs: use O_DIRECT when reading the superblock.  We've
  already sent a patch upstream and gotten it accepted there:
  
  commit 483c9f38e377ff0b009f546a2c4ee91a1d61588c
  From: Krister Johansen <k...@templeofstupid.com>
  Date: Mon, 18 Nov 2024 12:35:22 -0800
  Subject: libblkid: fix spurious ext superblock checksum mismatches
  Link: 
https://github.com/util-linux/util-linux/commit/483c9f38e377ff0b009f546a2c4ee91a1d61588c
  
  We had a discussion with Ted Ts'o about this as well, and he had some
  ideas for future improvements, but nothing that we're implementing in
  this fix:
  
  https://lore.kernel.org/util-
  
linux/6d16e6d83ab48d2ea4402db17c9c0ed5514933a7.1731961869.git.k...@templeofstupid.com/T/#m55eb5087639dcfcfd5708144b1b48caf0cf762b8

** Description changed:

  [Impact]
  
  Starting on Noble, we see /boot fail to mount in approximately one out
  of every two thousand boots.  The error looks like this:
  
     Found device dev-disk-by\x2dlabel-BOOT.device - QEMU NVMe Ctrl BOOT.
     Starting systemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT...
     Checking in progress on 1 disk (0.0% complete)
     Checking in progress on 0 disks (100.0% complete)
     Finished msystemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT.
     Mounting boot.mount - /boot...
     [    3.051612] /dev/disk/by-label/BOOT: Can't lookup blockdev
     FAILED Failed to mount boot.mount - /boot.
     See 'systemctl status boot.mount' for details.
  
  This has resulted in a number of different failure modes for our users.
  
  Anything that needs to interact with /boot during provisioning will
  fail.  This is usually something running update-grub or similar.
  
  If we manage to succeed in booting, this can cause subsequent kernel
  updates or tools that install kernel modules to fail, because
  update-grub fails.
  
  We've _also_ seen this manifest on the root filesystem.  In that case,
  the boot succeeded, but the by-label links remain absent.  When this
  occurs, we find installing kernel packages fails because mkinitramfs
  can't locate the root disk by label.
  
  [ Testcase ]
  
  It's a vexing problem, and so to reproduce we ran cloud images in a boot
  loop until we could reliably reproduce the problem.  Unfortunately, we
  weren't able to work out anything that made this happen faster, so it's
  been a bit slow coming.
  
  It turns out the problem here is that libblkid recently added support to
  compute the checksum of the superblocks on ext4 filesystems, and Noble
  is the first release to include a version of util-linux new enough to
  have this feature.  When libblkid determines an ext4 superblock's
  checksum is corrupt, it refuses to identify the device as having a
  filesystem, which leads to the removeal of the uuid and by-label fields.
  systemd-udevd then removes these symlinks.  This is where it all goes
  wrong.
  
  From our debug traces, it's possible to see this clearly:
  
  (udev-worker)[208]: nvme0n1p16: Probe /dev/nvme0n1p16 with raid and
  offset=0
  
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [36] ext4dev:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4dev, got 
D919EB5600000000, expected A47F6CF000000000
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [37] ext4:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4, got D919EB5600000000, 
expected A47F6CF000000000
  
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', which is no longer 
belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', removing
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-label/BOOT', which is no longer belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-label/BOOT', removing
  (udev-worker)[208]: nvme0n1p16: Successfully created symlink 
'/dev/block/259:4' to '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: sd-device: Created db file 
'/run/udev/data/b259:4' for 
'/devices/pci0000:00/0000:00:01.0/nvme/nvme0/nvme0n1/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Adding watch on '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Device processed (SEQNUM=1630, ACTION=change)
  (udev-worker)[208]: nvme0n1p16: sd-device-monitor(worker): Passed 1315 byte 
to netlink monitor.
  
  We've also been running with a version of this patch backported to
  2.39.3-9ubuntu6.1 in our production environment for a couple of months.
  It's completely eliminated this problem for us.  We were observing this
  about once an hour and have had no recurrence since rolling out the fix.
  
  There are test packages available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/lp2090972-updates
  
  [Where problems could occur]
  
  We are changing how superblocks are read off of filesystems at initial device
  probe time. Luckily, the initial read of the superblock is unchanged from what
  happens now. Only if that read fails to compute the correct checksum, then we
  read the superblock from the underlying disk with O_DIRECT, and hopefully
  compute the correct checksum. If that yeilds an mismatch, then we raise the
  incorrect checksum error as usual.
  
- Since this just adds a quick re-read and recompute of the checksum, if a race 
between memory and the underlying disk would occur, the recheck would take
+ Since this just adds a quick re-read and recompute of the checksum, if a race 
between memory
+ and the underlying disk would occur, the recheck would take
  minimal amounts of time and would not be noticeable during instance boot time.
  
  If a regression were to occur, it would affect checksum computation, which 
could
  cause the disk to be declared corrupted and not containing a valid filesystem,
  which would cause boot to fail, and would have a large impact to users.
  
  [Other Info]
  
  Fortunately, the fix here is straight-forward and is similar to what we
  did for resize2fs: use O_DIRECT when reading the superblock.  We've
  already sent a patch upstream and gotten it accepted there:
  
  commit 483c9f38e377ff0b009f546a2c4ee91a1d61588c
  From: Krister Johansen <k...@templeofstupid.com>
  Date: Mon, 18 Nov 2024 12:35:22 -0800
  Subject: libblkid: fix spurious ext superblock checksum mismatches
  Link: 
https://github.com/util-linux/util-linux/commit/483c9f38e377ff0b009f546a2c4ee91a1d61588c
  
  We had a discussion with Ted Ts'o about this as well, and he had some
  ideas for future improvements, but nothing that we're implementing in
  this fix:
  
  https://lore.kernel.org/util-
  
linux/6d16e6d83ab48d2ea4402db17c9c0ed5514933a7.1731961869.git.k...@templeofstupid.com/T/#m55eb5087639dcfcfd5708144b1b48caf0cf762b8

** Description changed:

  [Impact]
  
  Starting on Noble, we see /boot fail to mount in approximately one out
  of every two thousand boots.  The error looks like this:
  
     Found device dev-disk-by\x2dlabel-BOOT.device - QEMU NVMe Ctrl BOOT.
     Starting systemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT...
     Checking in progress on 1 disk (0.0% complete)
     Checking in progress on 0 disks (100.0% complete)
     Finished msystemd-fsck@dev-disk-by… Check on /dev/disk/by-label/BOOT.
     Mounting boot.mount - /boot...
     [    3.051612] /dev/disk/by-label/BOOT: Can't lookup blockdev
     FAILED Failed to mount boot.mount - /boot.
     See 'systemctl status boot.mount' for details.
  
  This has resulted in a number of different failure modes for our users.
  
  Anything that needs to interact with /boot during provisioning will
  fail.  This is usually something running update-grub or similar.
  
  If we manage to succeed in booting, this can cause subsequent kernel
  updates or tools that install kernel modules to fail, because
  update-grub fails.
  
  We've _also_ seen this manifest on the root filesystem.  In that case,
  the boot succeeded, but the by-label links remain absent.  When this
  occurs, we find installing kernel packages fails because mkinitramfs
  can't locate the root disk by label.
  
  [ Testcase ]
  
  It's a vexing problem, and so to reproduce we ran cloud images in a boot
  loop until we could reliably reproduce the problem.  Unfortunately, we
  weren't able to work out anything that made this happen faster, so it's
  been a bit slow coming.
  
  It turns out the problem here is that libblkid recently added support to
  compute the checksum of the superblocks on ext4 filesystems, and Noble
  is the first release to include a version of util-linux new enough to
  have this feature.  When libblkid determines an ext4 superblock's
  checksum is corrupt, it refuses to identify the device as having a
  filesystem, which leads to the removeal of the uuid and by-label fields.
  systemd-udevd then removes these symlinks.  This is where it all goes
  wrong.
  
  From our debug traces, it's possible to see this clearly:
  
  (udev-worker)[208]: nvme0n1p16: Probe /dev/nvme0n1p16 with raid and
  offset=0
  
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [36] ext4dev:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4dev, got 
D919EB5600000000, expected A47F6CF000000000
  systemd-udevd[208]: 208: libblkid: LOWPROBE: [37] ext4:
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         magic sboff=56, kboff=1
  systemd-udevd[208]: 208: libblkid: LOWPROBE:         call probefunc()
  systemd-udevd[208]: 208: libblkid:   BUFFER:         reuse: off=1024 len=1024 
(for off=1024 len=1024)
  systemd-udevd[208]: incorrect checksum for type ext4, got D919EB5600000000, 
expected A47F6CF000000000
  
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', which is no longer 
belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-uuid/ce209fd3-a875-4607-9089-35b9de605bd0', removing
  (udev-worker)[208]: nvme0n1p16: Removing/updating old device symlink 
'/dev/disk/by-label/BOOT', which is no longer belonging to this device.
  (udev-worker)[208]: nvme0n1p16: No reference left for 
'/dev/disk/by-label/BOOT', removing
  (udev-worker)[208]: nvme0n1p16: Successfully created symlink 
'/dev/block/259:4' to '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: sd-device: Created db file 
'/run/udev/data/b259:4' for 
'/devices/pci0000:00/0000:00:01.0/nvme/nvme0/nvme0n1/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Adding watch on '/dev/nvme0n1p16'
  (udev-worker)[208]: nvme0n1p16: Device processed (SEQNUM=1630, ACTION=change)
  (udev-worker)[208]: nvme0n1p16: sd-device-monitor(worker): Passed 1315 byte 
to netlink monitor.
  
  We've also been running with a version of this patch backported to
  2.39.3-9ubuntu6.1 in our production environment for a couple of months.
  It's completely eliminated this problem for us.  We were observing this
  about once an hour and have had no recurrence since rolling out the fix.
  
  There are test packages available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/lp2090972-updates
  
  [Where problems could occur]
  
  We are changing how superblocks are read off of filesystems at initial device
  probe time. Luckily, the initial read of the superblock is unchanged from what
  happens now. Only if that read fails to compute the correct checksum, then we
  read the superblock from the underlying disk with O_DIRECT, and hopefully
  compute the correct checksum. If that yeilds an mismatch, then we raise the
  incorrect checksum error as usual.
  
- Since this just adds a quick re-read and recompute of the checksum, if a race 
between memory
- and the underlying disk would occur, the recheck would take
+ Since this just adds a quick re-read and recompute of the checksum, if a race
+ between memory and the underlying disk would occur, the recheck would take
  minimal amounts of time and would not be noticeable during instance boot time.
  
  If a regression were to occur, it would affect checksum computation, which 
could
  cause the disk to be declared corrupted and not containing a valid filesystem,
  which would cause boot to fail, and would have a large impact to users.
  
  [Other Info]
  
  Fortunately, the fix here is straight-forward and is similar to what we
  did for resize2fs: use O_DIRECT when reading the superblock.  We've
  already sent a patch upstream and gotten it accepted there:
  
  commit 483c9f38e377ff0b009f546a2c4ee91a1d61588c
  From: Krister Johansen <k...@templeofstupid.com>
  Date: Mon, 18 Nov 2024 12:35:22 -0800
  Subject: libblkid: fix spurious ext superblock checksum mismatches
  Link: 
https://github.com/util-linux/util-linux/commit/483c9f38e377ff0b009f546a2c4ee91a1d61588c
  
  We had a discussion with Ted Ts'o about this as well, and he had some
  ideas for future improvements, but nothing that we're implementing in
  this fix:
  
  https://lore.kernel.org/util-
  
linux/6d16e6d83ab48d2ea4402db17c9c0ed5514933a7.1731961869.git.k...@templeofstupid.com/T/#m55eb5087639dcfcfd5708144b1b48caf0cf762b8

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2090972

Title:
  /boot intermittently fails to mount on boot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/2090972/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2090972] Re: /boot intermittently fails to mount on boot

Reply via email to