Public bug reported: SRU Justification
[Impact] A Linux guest on Hyper-V/Azure can occasionally crash during early Linux kernel boot due to a strange host behavior: 1. The host assigns a VF to the guest; 2. The host immediately unassigns the VF from the guest; //Dexuan: due to some race conditions bug in Linux vPCI driver, Linux can crash. 3. The host assigns the VF to the guest again. Starting late 2022 (around Nov 2022), Linux guests on Azure started to crash more frequently due to a host side update at that time: a new host/hypervisor feature of handling "correctable memory errors" can cause a lot of successive VF remove/add events, so the race conditions bug in Linux vPCI driver can surface much more easily. The Hyper-V team is implementing a batching mechanism so that the guest will get much less VF remove/add events (ETA: June 2023), but meanwhile we should also get the Linux race condition bugs fixed so that Linux guests won't crash even if it receives the successive VF remove/add events. [Test Plan] Microsoft tested [Regression potential] PCI devices may not get registered, or VMs may crash. [Other Info] SF: #00349076 ** Affects: linux-azure (Ubuntu) Importance: Undecided Status: New ** Affects: linux-azure (Ubuntu Focal) Importance: Medium Assignee: Tim Gardner (timg-tpi) Status: In Progress ** Affects: linux-azure (Ubuntu Jammy) Importance: Medium Assignee: Tim Gardner (timg-tpi) Status: In Progress ** Affects: linux-azure (Ubuntu Lunar) Importance: Medium Assignee: Tim Gardner (timg-tpi) Status: In Progress ** Also affects: linux (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Focal) Status: New => In Progress ** Changed in: linux (Ubuntu Focal) Assignee: (unassigned) => Tim Gardner (timg-tpi) ** Changed in: linux (Ubuntu Jammy) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Jammy) Status: New => In Progress ** Changed in: linux (Ubuntu Jammy) Assignee: (unassigned) => Tim Gardner (timg-tpi) ** Changed in: linux (Ubuntu Lunar) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Lunar) Status: New => In Progress ** Changed in: linux (Ubuntu Lunar) Assignee: (unassigned) => Tim Gardner (timg-tpi) ** Package changed: linux (Ubuntu) => linux-azure (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2023594 Title: Case [Azure] Fix VM crash/hang issues due to fast VF add/remove events Status in linux-azure package in Ubuntu: New Status in linux-azure source package in Focal: In Progress Status in linux-azure source package in Jammy: In Progress Status in linux-azure source package in Lunar: In Progress Bug description: SRU Justification [Impact] A Linux guest on Hyper-V/Azure can occasionally crash during early Linux kernel boot due to a strange host behavior: 1. The host assigns a VF to the guest; 2. The host immediately unassigns the VF from the guest; //Dexuan: due to some race conditions bug in Linux vPCI driver, Linux can crash. 3. The host assigns the VF to the guest again. Starting late 2022 (around Nov 2022), Linux guests on Azure started to crash more frequently due to a host side update at that time: a new host/hypervisor feature of handling "correctable memory errors" can cause a lot of successive VF remove/add events, so the race conditions bug in Linux vPCI driver can surface much more easily. The Hyper-V team is implementing a batching mechanism so that the guest will get much less VF remove/add events (ETA: June 2023), but meanwhile we should also get the Linux race condition bugs fixed so that Linux guests won't crash even if it receives the successive VF remove/add events. [Test Plan] Microsoft tested [Regression potential] PCI devices may not get registered, or VMs may crash. [Other Info] SF: #00349076 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2023594/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp