[Kernel-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable
I've been seeing this bug as far back as losetup is used (jammy) and the previously merged fix has a significant improvement on successful builds (0/3 succeeded without the patch applied vs 5/5 succeeded with the patch applied this week) in jammy. Is someone already looking to SRU that patch back to mantic and jammy or should I work on it? Even if it's not perfectly, I think it would be extremely benefitial. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: Fix Released Status in util-linux package in Ubuntu: New Status in linux source package in Jammy: New Status in livecd-rootfs source package in Jammy: New Status in util-linux source package in Jammy: New Status in linux source package in Mantic: New Status in livecd-rootfs source package in Mantic: New Status in util-linux source package in Mantic: New Status in linux source package in Noble: New Status in livecd-rootfs source package in Noble: Fix Released Status in util-linux source package in Noble: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable
I tested the flock-based solution with some of the CPC pipelines in jammy and saw consistently clean builds (30 successful images built yesterday). Thank you very much for everyone's hard work debugging and fixing this race condition! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: Fix Released Status in util-linux package in Ubuntu: New Status in linux source package in Jammy: New Status in livecd-rootfs source package in Jammy: Fix Released Status in util-linux source package in Jammy: New Status in linux source package in Mantic: New Status in livecd-rootfs source package in Mantic: New Status in util-linux source package in Mantic: New Status in linux source package in Noble: New Status in livecd-rootfs source package in Noble: Fix Released Status in util-linux source package in Noble: New Bug description: [impact] In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race-free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. The losetup usage has been backported to Jammy, and sees frequent failures there. [test case] The autopkgtests will provide enough confidence that the changes are not completely broken. Whether the change helps with the races on riscv can be "tested in prod" just as well as any other way. [regression potential] If the backport has been done incorrectly, image builds can fail (and the autopkgtests will fail if it has been completely bungled). This can be quickly handled. There is no foreseeable way for this to result in successful builds but broken images, which would be a much more difficult failure mode to unpick. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2063315] Re: Suspend & Resume functionality broken/timesout in GCE
** Changed in: linux-gcp (Ubuntu) Status: Triaged => Fix Released ** Changed in: linux-gcp (Ubuntu Noble) Status: Triaged => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-gcp in Ubuntu. https://bugs.launchpad.net/bugs/2063315 Title: Suspend & Resume functionality broken/timesout in GCE Status in Release Notes for Ubuntu: New Status in linux-gcp package in Ubuntu: Fix Released Status in linux-gcp source package in Noble: Fix Released Bug description: [Impact] Suspend/Resume capability is broken in all noble images with kernel version 6.8.0-1007-gcp. GCE offers the capability to "Suspend" a VM to conserve power/lower costs when the instance is not in use [0]. It uses ACPI S3 signals to tell the guest to power down. This capability no longer works in the latest kernel with the following error: ``` Operation type [suspend] failed with message "Instance suspend failed due to guest timeout." ``` which points to the following [1]. Refs: [0]: https://cloud.google.com/compute/docs/instances/suspend-resume- instance [1]: https://cloud.google.com/compute/docs/troubleshooting/troubleshooting- suspend-resume#there_was_a_guest_timeout To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-release-notes/+bug/2063315/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2063315] Re: Suspend & Resume functionality broken/timesout in GCE
Fixed in linux-gcp 6.8.0-1012.13 which included a patch suggested by upstream: https://lore.kernel.org/kvm/cacgkmeth_9baewekq862ygzwuozwg96z3g6oyqhzycj2jpu...@mail.gmail.com/T/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-gcp in Ubuntu. https://bugs.launchpad.net/bugs/2063315 Title: Suspend & Resume functionality broken/timesout in GCE Status in Release Notes for Ubuntu: New Status in linux-gcp package in Ubuntu: Fix Released Status in linux-gcp source package in Noble: Fix Released Status in linux-gcp source package in Oracular: Fix Released Bug description: [Impact] Suspend/Resume capability is broken in all noble images with kernel version 6.8.0-1007-gcp. GCE offers the capability to "Suspend" a VM to conserve power/lower costs when the instance is not in use [0]. It uses ACPI S3 signals to tell the guest to power down. This capability no longer works in the latest kernel with the following error: ``` Operation type [suspend] failed with message "Instance suspend failed due to guest timeout." ``` which points to the following [1]. Refs: [0]: https://cloud.google.com/compute/docs/instances/suspend-resume- instance [1]: https://cloud.google.com/compute/docs/troubleshooting/troubleshooting- suspend-resume#there_was_a_guest_timeout To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-release-notes/+bug/2063315/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp