[Kernel-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable

2024-02-09 Thread Catherine Redfield
I've been seeing this bug as far back as losetup is used (jammy) and the
previously merged fix has a significant improvement on successful builds
(0/3 succeeded without the patch applied vs 5/5 succeeded with the patch
applied this week) in jammy.  Is someone already looking to SRU that
patch back to mantic and jammy or should I work on it?  Even if it's not
perfectly, I think it would be extremely benefitial.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  Fix Released
Status in util-linux package in Ubuntu:
  New
Status in linux source package in Jammy:
  New
Status in livecd-rootfs source package in Jammy:
  New
Status in util-linux source package in Jammy:
  New
Status in linux source package in Mantic:
  New
Status in livecd-rootfs source package in Mantic:
  New
Status in util-linux source package in Mantic:
  New
Status in linux source package in Noble:
  New
Status in livecd-rootfs source package in Noble:
  Fix Released
Status in util-linux source package in Noble:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable

2024-02-21 Thread Catherine Redfield
I tested the flock-based solution with some of the CPC pipelines in
jammy and saw consistently clean builds (30 successful images built
yesterday).  Thank you very much for everyone's hard work debugging and
fixing this race condition!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  Fix Released
Status in util-linux package in Ubuntu:
  New
Status in linux source package in Jammy:
  New
Status in livecd-rootfs source package in Jammy:
  Fix Released
Status in util-linux source package in Jammy:
  New
Status in linux source package in Mantic:
  New
Status in livecd-rootfs source package in Mantic:
  New
Status in util-linux source package in Mantic:
  New
Status in linux source package in Noble:
  New
Status in livecd-rootfs source package in Noble:
  Fix Released
Status in util-linux source package in Noble:
  New

Bug description:
  [impact]
  In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, 
with the expectation that this would give us a reliable, race-free way of 
loop-mounting partitions from a disk image during image build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

    https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

    https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

  The losetup usage has been backported to Jammy, and sees frequent
  failures there.

  [test case]
  The autopkgtests will provide enough confidence that the changes are not 
completely broken. Whether the change helps with the races on riscv can be 
"tested in prod" just as well as any other way.

  [regression potential]
  If the backport has been done incorrectly, image builds can fail (and the 
autopkgtests will fail if it has been completely bungled). This can be quickly 
handled. There is no foreseeable way for this to result in successful builds 
but broken images, which would be a much more difficult failure mode to unpick.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2063315] Re: Suspend & Resume functionality broken/timesout in GCE

2024-09-18 Thread Catherine Redfield
** Changed in: linux-gcp (Ubuntu)
   Status: Triaged => Fix Released

** Changed in: linux-gcp (Ubuntu Noble)
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-gcp in Ubuntu.
https://bugs.launchpad.net/bugs/2063315

Title:
  Suspend & Resume functionality broken/timesout in GCE

Status in Release Notes for Ubuntu:
  New
Status in linux-gcp package in Ubuntu:
  Fix Released
Status in linux-gcp source package in Noble:
  Fix Released

Bug description:
  [Impact]
   
  Suspend/Resume capability is broken in all noble images with kernel version 
6.8.0-1007-gcp.

  GCE offers the capability to "Suspend" a VM to conserve power/lower
  costs when the instance is not in use [0]. It uses ACPI S3 signals to
  tell the guest to power down. This capability no longer works in the
  latest kernel with the following error:

  ```
  Operation type [suspend] failed with message "Instance suspend failed due to 
guest timeout."
  ```

  which points to the following [1].

  

  Refs:

  [0]: https://cloud.google.com/compute/docs/instances/suspend-resume-
  instance

  [1]:
  https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-
  suspend-resume#there_was_a_guest_timeout

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/2063315/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2063315] Re: Suspend & Resume functionality broken/timesout in GCE

2024-09-20 Thread Catherine Redfield
Fixed in linux-gcp 6.8.0-1012.13 which included a patch suggested by
upstream:
https://lore.kernel.org/kvm/cacgkmeth_9baewekq862ygzwuozwg96z3g6oyqhzycj2jpu...@mail.gmail.com/T/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-gcp in Ubuntu.
https://bugs.launchpad.net/bugs/2063315

Title:
  Suspend & Resume functionality broken/timesout in GCE

Status in Release Notes for Ubuntu:
  New
Status in linux-gcp package in Ubuntu:
  Fix Released
Status in linux-gcp source package in Noble:
  Fix Released
Status in linux-gcp source package in Oracular:
  Fix Released

Bug description:
  [Impact]
   
  Suspend/Resume capability is broken in all noble images with kernel version 
6.8.0-1007-gcp.

  GCE offers the capability to "Suspend" a VM to conserve power/lower
  costs when the instance is not in use [0]. It uses ACPI S3 signals to
  tell the guest to power down. This capability no longer works in the
  latest kernel with the following error:

  ```
  Operation type [suspend] failed with message "Instance suspend failed due to 
guest timeout."
  ```

  which points to the following [1].

  

  Refs:

  [0]: https://cloud.google.com/compute/docs/instances/suspend-resume-
  instance

  [1]:
  https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-
  suspend-resume#there_was_a_guest_timeout

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/2063315/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp