I have a bit of an update on the sev-snp delay in booting. I am still trying to put everything together, and I will finalise everything on Monday after I finish going through all of the logs I generated today.
I traced through apic/x2apic call stack and I've been digging through the kernel code, and I suspect I know the issue. I think it comes down to a mixture of SNP-SEV support not being fully available in the kernel, and in the hypervisor with how it handles the APIC (the interrupt controller). Commit 1dfe571c12cf introduced the kernel support for kvm handling of SEV-SNP. This was not added to the kernel until kernel version 6.11. I need to dig through the kvm source to see how it handles the apix/x2apic with SNP-SEV still. Without the support in the kernel and hypervisor, it seems that the behaviour is somewhat undefined and relies on what is happening in the other guest VMs on the machine. This will explain why the pause does not hit every time you boot, but once it does hit, it reproduces every time.  One thing I have noticed is that when a VM f first started, it does not always happen. After a several reboots it usually does happen. Once it has happened once though, it happens every reboot after that.  I noticed GCP supports SEV-SNP, so I created an instance there, and installed the AWS kernel. I then installed the GCP kernel on the AWS instance. The AWS instance is hitting the pause with the GCP, and the AWS kernel, while the GCP instance has not hit it after about 100 reboots using the AWS kernel.. This makes me think it is the hypervisor rather than the kernel causing the issue. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/2076217 Title: booting an 24.10 or 24.04 ec2 instance with SEV-SNP enabled hangs sometimes Status in cloud-images: New Status in linux-aws package in Ubuntu: In Progress Bug description: I'm trying to test an EC2 instance with SEV-SNP enabled. But the boot process hangs at: [snipped] [ 0.609079] printk: legacy console [ttyS0] enabled [ 5.405931] ACPI: Core revision 20230628 [ 5.430448] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 30580167144 ns [ 5.473066] APIC: Switch to symmetric I/O mode setup Steps to reproduce are: $ AWS_DEFAULT_REGION=eu-west-1 aws ec2 run-instances --image-id ami-005a44922e2ffd1fa --instance-type m6a.large --cpu-options AmdSevSnp=enabled --key-name toabctl --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=toabctl-2410-sevsnp- testing}]' The AMI ami-005a44922e2ffd1fa is ubuntu/images-testing/hvm-ssd- gp3/ubuntu-oracular-daily-amd64-server-20240716 and does contain 6.8.0-1008-aws . Attached is the full output from the EC2 serial console. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2076217/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp