** Description changed:

  [Impact]
  Firecracker process crashes with an "out of memory" error when it attempts to
  run the vCPU for the first time, even if the system has enough available
  memory:
  ```
  2025-05-02T16:31:21.850912998 [daf77128-f177-4a01-9b97-a88dd9faa78f:fc_vcpu 
0] Failure during vcpu run: Out of memory (os error 12)
  ```
  
  The issue is triggered by a race condition caused by the VMM thread sending a
- SIGRTMIN to the vCPU thread, while it is starting
- the nx_huge_page_recovery_thread. This makes the thread creation fail, but due
- to a bug in the kernel, it is classified as a ENOMEM, instead of a
- ERESTARTNOINTR, which should be retried.
+ SIGRTMIN to the vCPU thread, while it is starting the
+ nx_huge_page_recovery_thread. This makes the thread creation fail, but due to 
a
+ bug in the kernel, it is classified as a ENOMEM, instead of a ERESTARTNOINTR,
+ which should be retried.
  
+ This only affects 6.8 kernels, since the bug is introduced by the following
+ commits, backported to the noble:linux 6.8.0-58.60 kernel as part of the 
upstream
+ stable updates (LP: #2101915):
+ - 43fb96ae7855 ("KVM: x86/mmu: Ensure NX huge page recovery thread is alive 
before waking")
+ - 931656b9e2ff ("kvm: defer huge page recovery vhost task to later")
+ - d96c77bd4eeb ("KVM: x86: switch hugepage recovery thread to vhost_task")
  
  [Fix]
  Cherry-pick cb380909ae3b ("vhost: return task creation error instead of NULL")
  and 916b7f42b3b3 ("kvm: retry nx_huge_page_recovery_thread creation").
  
- 
  [Test Case]
- 
+ 1) Launch a Noble c5.metal instance on AWS
+ 2) Install and boot into the linux-generic 6.8 kernel
+ 3) Install docker and aws-cli
+ 4) git clone https://github.com/firecracker-microvm/firecracker.git
+ 5) Go to the firecracker directory and run `./tools/devtool test -- -n16 
integration_tests/functional/test_snapshot_basic.py::test_cycled_snapshot_restore`
+ 6) With this patchset, observe that all tests pass. Without it, a couple
+ of tests will fail accusing out of memory.
  
  [Where problems could occur]
- 
+ This patchset touches vhost_task_create(), making it return specific error
+ pointers instead of just NULL. Problems could occur if its callers
+ mishandle the return value.
+ More broadly, it also touches code responsible for MM of KVM VMs, and issues
+ could appear as these VMs failing to initialize.
  
  [Other info]
  SF #00410184

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2109859

Title:
  KVM bug causes Firecracker crash when it runs the vCPU for the first
  time

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2109859/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to