Public bug reported: SRU Justification
[Impact] [Azure][CVM] Include the swiotlb patch to increase the disk/network performance Description As we discussed, there will be new CVM-supporting linux-azure kernels that're based on v5.13 and v5.15. Here I'm requesting the below patch to be included into the two kernels because it can significantly improve the disk/network performance: swiotlb: Split up single swiotlb lock: https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45 We have tested the patch with the upstream 5.16-rc8. BTW, the patch is unlikely to be in the mainline kernel, as the community is trying to resolve the lock contention issue in the swiotlb code using a different per-device per-queue implementation, which would need quite some time to be finalized -- before that happens, we need this out-of-tree patch to achieve good disk/network performance for CVM GA on Azure. (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch, because it uses a private bounce buffer implementation: drivers/hv/hv_bounce.c, which doesn’t have the I/O performance issue caused by lock contention in the mainline kernel’s swiotlb code.) [Test Case] [Microsoft tested] I tried the April-27 amd64 test kernel and it worked great for me: 1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB memory. 2. The kernel worked when I changed the MTU of the NetVSC NIC. 3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as expected. 4. I did some quick disk I/O and network stress tests and found no issue. When I did the above tests, I changed the low MMIO size to 3GB (which is the setting for a VM on Azure today) by "set-vm decui-u2004-cvm -LowMemoryMappedIoSpace 3GB". Our test team will do more testing, including performance test. We expect the performance of this v5.15 test kernel should be on par with the v5.4 linux-azure-cvm kernel. [Where things could go wrong] Networking could fail or continue to suffer from poor performance. [Other Info] SF: #00332721 ** Affects: linux-azure (Ubuntu) Importance: Undecided Status: New ** Package changed: linux (Ubuntu) => linux-azure (Ubuntu) ** Description changed: SRU Justification [Impact] - [Azure][CVM] Include the swiotlb patch to increase the disk/network performance + [Azure][CVM] Include the swiotlb patch to increase the disk/network + performance + Description As we discussed, there will be new CVM-supporting linux-azure kernels that're based on v5.13 and v5.15. Here I'm requesting the below patch to be included into the two kernels because it can significantly improve the disk/network performance: swiotlb: Split up single swiotlb lock: https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45 We have tested the patch with the upstream 5.16-rc8. - Not sure if the patch can apply cleanly to the v5.13 and v5.15 kernels. If not, I think we can help do the rebasing. BTW, the patch is unlikely to be in the mainline kernel, as the community is trying to resolve the lock contention issue in the swiotlb code using a different per-device per-queue implementation, which would need quite some time to be finalized -- before that happens, we need this out-of-tree patch to achieve good disk/network performance for CVM GA on Azure. (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch, because it uses a private bounce buffer implementation: drivers/hv/hv_bounce.c, which doesn’t have the I/O performance issue caused by lock contention in the mainline kernel’s swiotlb code.) [Test Case] [Microsoft tested] - -I tried the April-27 amd64 test kernel and it worked great for me: + I tried the April-27 amd64 test kernel and it worked great for me: 1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB memory. 2. The kernel worked when I changed the MTU of the NetVSC NIC. 3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as expected. 4. I did some quick disk I/O and network stress tests and found no issue. When I did the above tests, I changed the low MMIO size to 3GB (which is the setting for a VM on Azure today) by "set-vm decui-u2004-cvm -LowMemoryMappedIoSpace 3GB". Our test team will do more testing, including performance test. We expect the performance of this v5.15 test kernel should be on par with the v5.4 linux-azure-cvm kernel. [Where things could go wrong] Networking could fail or continue to suffer from poor performance. [Other Info] SF: #00332721 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1971701 Title: Azure: swiotlb patch needed for CVM Status in linux-azure package in Ubuntu: New Bug description: SRU Justification [Impact] [Azure][CVM] Include the swiotlb patch to increase the disk/network performance Description As we discussed, there will be new CVM-supporting linux-azure kernels that're based on v5.13 and v5.15. Here I'm requesting the below patch to be included into the two kernels because it can significantly improve the disk/network performance: swiotlb: Split up single swiotlb lock: https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45 We have tested the patch with the upstream 5.16-rc8. BTW, the patch is unlikely to be in the mainline kernel, as the community is trying to resolve the lock contention issue in the swiotlb code using a different per-device per-queue implementation, which would need quite some time to be finalized -- before that happens, we need this out-of-tree patch to achieve good disk/network performance for CVM GA on Azure. (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch, because it uses a private bounce buffer implementation: drivers/hv/hv_bounce.c, which doesn’t have the I/O performance issue caused by lock contention in the mainline kernel’s swiotlb code.) [Test Case] [Microsoft tested] I tried the April-27 amd64 test kernel and it worked great for me: 1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB memory. 2. The kernel worked when I changed the MTU of the NetVSC NIC. 3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as expected. 4. I did some quick disk I/O and network stress tests and found no issue. When I did the above tests, I changed the low MMIO size to 3GB (which is the setting for a VM on Azure today) by "set-vm decui-u2004-cvm -LowMemoryMappedIoSpace 3GB". Our test team will do more testing, including performance test. We expect the performance of this v5.15 test kernel should be on par with the v5.4 linux-azure-cvm kernel. [Where things could go wrong] Networking could fail or continue to suffer from poor performance. [Other Info] SF: #00332721 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1971701/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp