** Summary changed: - jammy/linux-aws hibernation timeout on xen instances + Jammy / Kinetic: Enable Hibernation for Xen Based Instance Types
** Also affects: linux-aws (Ubuntu Kinetic) Importance: Undecided Status: New ** Also affects: linux-aws (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-aws (Ubuntu Jammy) Status: New => In Progress ** Changed in: linux-aws (Ubuntu Kinetic) Status: New => In Progress ** Changed in: linux-aws (Ubuntu Jammy) Importance: Undecided => Critical ** Changed in: linux-aws (Ubuntu Kinetic) Importance: Undecided => Critical ** Changed in: linux-aws (Ubuntu Jammy) Assignee: (unassigned) => gerald.yang (gerald-yang-tw) ** Changed in: linux-aws (Ubuntu Kinetic) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Description changed: - Hibernation testing of jammy/linux-aws 5.15.0-1003-aws is failing on all - xen instance types (c3/c4/i3/m3/m4/r3/r4/t2). The failure happens while - attempting to resume from the first attempt to hibernate. Testing on - nitro instances types (c5/m5/r5/t3) all pass. + [Impact] - After the resume, the system is inaccessible via ssh. The console - screenshot does change, but the console log obtained from `aws ec2 get- - console-output` does not. + Hibernation currently fails for all AWS Xen instance types + (c3/c4/i3/m3/m4/r3/r4/t2) with all Jammy 5.15 and Kinetic 5.19 linux-aws + kernels. + + When attempting to hibernate, the system gets stuck in + sync_inodes_one_sb() when processing the rootfs, fails to hibernate, and + shuts down. When you start the instance, it starts fresh, and does not + resume from the incomplete hibernation image. Networking is also broken, + and you cannot ssh in. + + Upon review of the jammy/linux-aws git log, it appears that the kernel + is missing AWS hibernation enablement patches entirely. These need to be + included to get hibernation working. + + [Fix] + + Hibernation currently works on the Amazon Linux 2 5.15 Kernel: + https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline + + After careful review of the amazon-5.15.y/mainline branch, we have found + the below set of patches authored by Amazon AWS Hibernation team to be + minimally sufficient to get hibernation working on both Jammy 5.15 and + Kinetic 5.19. + + x86: Disable KASLR when Xen is detected + xen: Restore xen-pirqs on resume from hibernation + xen-netfront: call netif_device_attach on resume + xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs. + xen: restore pirqs on resume from hibernation. + block: xen-blkfront: consider new dom0 features on restore + x86: tsc: avoid system instability in hibernation + xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq + Revert "xen: dont fiddle with event channel masking in suspend/resume" + PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA + x86/xen: close event channels for PIRQs in system core suspend callback + xen/events: add xen_shutdown_pirqs helper function + x86/xen: save and restore steal clock + xen/time: introduce xen_{save,restore}_steal_clock + xen-netfront: add callbacks for PM suspend and hibernation support + xen-blkfront: add callbacks for PM suspend and hibernation + x86/xen: add system core suspend and resume callbacks + x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume + xenbus: add freeze/thaw/restore callbacks support + xen/manage: introduce helper function to know the on-going suspend mode + xen/manage: keep track of the on-going suspend mode + + These patches will be carried as SAUCE patches, and their subjects + marked with "UBUNTU: SAUCE [aws]". Their upstream is the Amazon + Hibernation team, with the repo being the Amazon Linux 2 kernel repo. + + [Testcase] + + 1. Log into Amazon EC2. + 2. Select Launch Instance. + 3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest t2.medium. + 4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane. + 5. Select your SSH keypair. + 6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes. + 7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable. + 8. Create the Instance. SSH in. + 9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub. + 10. Start a screen session. Echo some text and then detach with ctrl-d. + 11. Log out from instance. + 12. In EC2, select "Instance State" > "Hibernate". + 13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped". + 14. Start the instance again. + 15. SSH in. + 16. Attempt to resume screen session with "screen -r". + + If you are not able to ssh into the instance, hibernation had failed. If + ssh works and the screen session is still running, hibernation was + successful. + + Alternatively, the CPC team can run their Hibernation testsuite over + Jammy and Kinetic. + + We have built test kernels for Jammy and Kinetic with the patches, and + they are available in the below ppa: + + If you try and hibernate and resume with the test kernels, hibernation + is successful. + + [Where problems could occur] + + We are adding a significant amount of code to the Xen subsystem, spread + across many commits. This code has not been mainlined, and is instead + maintained out of tree by the Amazon AWS Hibernation team. + + The changes target hibernation, block devices, and clock devices, + specific to those used on AWS Xen instances. Most of these patches have + been applied to Xenial, Bionic, Focal and other series for a long time, + but some patches are new for 5.15 onward. + + The changes will only target linux-aws to try and limit regression risk + to AWS users, and any regressions will be limited to users of Xen based + instance types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen + 4.11. + + If a regression were to occur, the instance would likely fail to + hibernate, and at worst, write an incomplete hibernation image to the + swapfile. The kernel will see this on start, and instead of resuming + from the hibernation image, will start fresh. It is unlikely to cause + any filesystem corruption on the rootfs, but any in progress + computations at the time of hibernation could be lost. The current + broken behaviour breaks networking, and users would have to power cycle + the instance a few times before they can ssh in again. ** Tags added: jammy kinetic sts -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1968062 Title: Jammy / Kinetic: Enable Hibernation for Xen Based Instance Types Status in linux-aws package in Ubuntu: In Progress Status in linux-aws source package in Jammy: In Progress Status in linux-aws source package in Kinetic: In Progress Bug description: [Impact] Hibernation currently fails for all AWS Xen instance types (c3/c4/i3/m3/m4/r3/r4/t2) with all Jammy 5.15 and Kinetic 5.19 linux- aws kernels. When attempting to hibernate, the system gets stuck in sync_inodes_one_sb() when processing the rootfs, fails to hibernate, and shuts down. When you start the instance, it starts fresh, and does not resume from the incomplete hibernation image. Networking is also broken, and you cannot ssh in. Upon review of the jammy/linux-aws git log, it appears that the kernel is missing AWS hibernation enablement patches entirely. These need to be included to get hibernation working. [Fix] Hibernation currently works on the Amazon Linux 2 5.15 Kernel: https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline After careful review of the amazon-5.15.y/mainline branch, we have found the below set of patches authored by Amazon AWS Hibernation team to be minimally sufficient to get hibernation working on both Jammy 5.15 and Kinetic 5.19. x86: Disable KASLR when Xen is detected xen: Restore xen-pirqs on resume from hibernation xen-netfront: call netif_device_attach on resume xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs. xen: restore pirqs on resume from hibernation. block: xen-blkfront: consider new dom0 features on restore x86: tsc: avoid system instability in hibernation xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq Revert "xen: dont fiddle with event channel masking in suspend/resume" PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA x86/xen: close event channels for PIRQs in system core suspend callback xen/events: add xen_shutdown_pirqs helper function x86/xen: save and restore steal clock xen/time: introduce xen_{save,restore}_steal_clock xen-netfront: add callbacks for PM suspend and hibernation support xen-blkfront: add callbacks for PM suspend and hibernation x86/xen: add system core suspend and resume callbacks x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume xenbus: add freeze/thaw/restore callbacks support xen/manage: introduce helper function to know the on-going suspend mode xen/manage: keep track of the on-going suspend mode These patches will be carried as SAUCE patches, and their subjects marked with "UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, with the repo being the Amazon Linux 2 kernel repo. [Testcase] 1. Log into Amazon EC2. 2. Select Launch Instance. 3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest t2.medium. 4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane. 5. Select your SSH keypair. 6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes. 7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable. 8. Create the Instance. SSH in. 9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub. 10. Start a screen session. Echo some text and then detach with ctrl-d. 11. Log out from instance. 12. In EC2, select "Instance State" > "Hibernate". 13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped". 14. Start the instance again. 15. SSH in. 16. Attempt to resume screen session with "screen -r". If you are not able to ssh into the instance, hibernation had failed. If ssh works and the screen session is still running, hibernation was successful. Alternatively, the CPC team can run their Hibernation testsuite over Jammy and Kinetic. We have built test kernels for Jammy and Kinetic with the patches, and they are available in the below ppa: If you try and hibernate and resume with the test kernels, hibernation is successful. [Where problems could occur] We are adding a significant amount of code to the Xen subsystem, spread across many commits. This code has not been mainlined, and is instead maintained out of tree by the Amazon AWS Hibernation team. The changes target hibernation, block devices, and clock devices, specific to those used on AWS Xen instances. Most of these patches have been applied to Xenial, Bionic, Focal and other series for a long time, but some patches are new for 5.15 onward. The changes will only target linux-aws to try and limit regression risk to AWS users, and any regressions will be limited to users of Xen based instance types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen 4.11. If a regression were to occur, the instance would likely fail to hibernate, and at worst, write an incomplete hibernation image to the swapfile. The kernel will see this on start, and instead of resuming from the hibernation image, will start fresh. It is unlikely to cause any filesystem corruption on the rootfs, but any in progress computations at the time of hibernation could be lost. The current broken behaviour breaks networking, and users would have to power cycle the instance a few times before they can ssh in again. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1968062/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp