I have a bit of an update on the sev-snp delay in booting.  I am still trying 
to put everything together, and I will finalise everything on Monday after I  
finish going through all of the logs I generated today.  


I traced through apic/x2apic call stack and I've been digging through the 
kernel code, and I suspect I know the issue.   

I think it comes down to a mixture of SNP-SEV support not being fully
available in the kernel, and in the hypervisor with how it handles the
APIC (the interrupt controller).  Commit 1dfe571c12cf introduced the
kernel support for kvm handling of  SEV-SNP.   This was not added to the
kernel until kernel version 6.11.  I need to dig through the kvm source
to see how it handles the apix/x2apic with SNP-SEV still.

Without the support in the kernel and hypervisor, it seems that the
behaviour is somewhat undefined and relies on what is happening in the
other guest VMs on the machine.   This will explain why the pause does
not hit every time you boot, but once it does hit, it reproduces every
time.


One thing I have noticed is that when a VM f first started, it does not always  
happen.  After a several reboots it usually does happen.  Once it has happened 
once though, it happens every reboot after that.


  I noticed GCP supports SEV-SNP, so I created an instance there, and 
installed the AWS kernel.  I then installed the GCP kernel on the AWS instance. 
 The AWS instance is hitting the pause with the GCP, and the AWS kernel, while 
the GCP instance has not hit it after about 100 reboots using the AWS kernel..  

This makes me think it is the hypervisor rather than the kernel causing
the issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/2076217

Title:
  booting an 24.10 or 24.04 ec2 instance with SEV-SNP enabled hangs
  sometimes

Status in cloud-images:
  New
Status in linux-aws package in Ubuntu:
  In Progress

Bug description:
  I'm trying to test an EC2 instance with SEV-SNP enabled. But the boot
  process hangs at:

  [snipped]
  [    0.609079] printk: legacy console [ttyS0] enabled
  [    5.405931] ACPI: Core revision 20230628
  [    5.430448] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, 
max_idle_ns: 30580167144 ns
  [    5.473066] APIC: Switch to symmetric I/O mode setup

  
  Steps to reproduce are:

  $ AWS_DEFAULT_REGION=eu-west-1 aws ec2 run-instances --image-id
  ami-005a44922e2ffd1fa --instance-type m6a.large --cpu-options
  AmdSevSnp=enabled --key-name toabctl --tag-specifications
  'ResourceType=instance,Tags=[{Key=Name,Value=toabctl-2410-sevsnp-
  testing}]'

  The AMI ami-005a44922e2ffd1fa is ubuntu/images-testing/hvm-ssd-
  gp3/ubuntu-oracular-daily-amd64-server-20240716 and does contain
  6.8.0-1008-aws .

  Attached is the full output from the EC2 serial console.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2076217/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to