Hi Matthew, Thanks for the update. I went ahead and tested your updated packages on a Focal, Jammy, and Noble image in EC2 this evening. With the latest packages installed, I was unable to reproduce the problem on any of the three installs. I'm uncertain which builds were inconsistent about triggering the problem for you, but it might be worth noting that the version of the package after Focal got an additional partial fix for the superblock checksum mismatch. In those cases, it'll re-try the read of the block up to 3 times before returning a failure. In my previous testing, this would increase the amount of time before one hits the problem, but not eliminate it entirely.
Thanks again for you help with getting these patches in. It's much appreciated! -K -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o <ty...@mit.edu> Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp