Public bug reported:

SRU Justification

[Impact]

[Azure][CVM] Include the swiotlb patch to increase the disk/network
performance

Description
As we discussed, there will be new CVM-supporting linux-azure kernels that're 
based on v5.13 and v5.15. Here I'm requesting the below patch to be included 
into the two kernels because it can significantly improve the disk/network 
performance:

swiotlb: Split up single swiotlb lock:
https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45

We have tested the patch with the upstream 5.16-rc8.
BTW, the patch is unlikely to be in the mainline kernel, as the community is 
trying to resolve the lock contention issue in the swiotlb code using a 
different per-device per-queue implementation, which would need quite some time 
to be finalized -- before that happens, we need this out-of-tree patch to 
achieve good disk/network performance for CVM GA on Azure.

(BTW, the v5.4-based linux-azure-cvm kernel does not need the patch,
because it uses a private bounce buffer implementation:
drivers/hv/hv_bounce.c, which doesn’t have the I/O performance issue
caused by lock contention in the mainline kernel’s swiotlb code.)

[Test Case]

[Microsoft tested]

I tried the April-27 amd64 test kernel and it worked great for me:
1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB 
memory.
2. The kernel worked when I changed the MTU of the NetVSC NIC.
3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as 
expected.
4. I did some quick disk I/O and network stress tests and found no issue.

When I did the above tests, I changed the low MMIO size to 3GB (which is
the setting for a VM on Azure today) by "set-vm decui-u2004-cvm
-LowMemoryMappedIoSpace 3GB".

Our test team will do more testing, including performance test. We
expect the performance of this v5.15 test kernel should be on par with
the v5.4 linux-azure-cvm kernel.

[Where things could go wrong]

Networking could fail or continue to suffer from poor performance.

[Other Info]

SF: #00332721

** Affects: linux-azure (Ubuntu)
     Importance: Undecided
         Status: New

** Package changed: linux (Ubuntu) => linux-azure (Ubuntu)

** Description changed:

  SRU Justification
  
  [Impact]
  
- [Azure][CVM] Include the swiotlb patch to increase the disk/network 
performance
+ [Azure][CVM] Include the swiotlb patch to increase the disk/network
+ performance
+ 
  Description
  As we discussed, there will be new CVM-supporting linux-azure kernels that're 
based on v5.13 and v5.15. Here I'm requesting the below patch to be included 
into the two kernels because it can significantly improve the disk/network 
performance:
  
  swiotlb: Split up single swiotlb lock:
  https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45
  
  We have tested the patch with the upstream 5.16-rc8.
- Not sure if the patch can apply cleanly to the v5.13 and v5.15 kernels. If 
not, I think we can help do the rebasing.
  BTW, the patch is unlikely to be in the mainline kernel, as the community is 
trying to resolve the lock contention issue in the swiotlb code using a 
different per-device per-queue implementation, which would need quite some time 
to be finalized -- before that happens, we need this out-of-tree patch to 
achieve good disk/network performance for CVM GA on Azure.
  
  (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch,
  because it uses a private bounce buffer implementation:
  drivers/hv/hv_bounce.c, which doesn’t have the I/O performance issue
  caused by lock contention in the mainline kernel’s swiotlb code.)
  
  [Test Case]
  
  [Microsoft tested]
  
- -I tried the April-27 amd64 test kernel and it worked great for me:
+ I tried the April-27 amd64 test kernel and it worked great for me:
  1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB 
memory.
  2. The kernel worked when I changed the MTU of the NetVSC NIC.
  3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as 
expected.
  4. I did some quick disk I/O and network stress tests and found no issue.
  
  When I did the above tests, I changed the low MMIO size to 3GB (which is
  the setting for a VM on Azure today) by "set-vm decui-u2004-cvm
  -LowMemoryMappedIoSpace 3GB".
  
  Our test team will do more testing, including performance test. We
  expect the performance of this v5.15 test kernel should be on par with
  the v5.4 linux-azure-cvm kernel.
  
  [Where things could go wrong]
  
  Networking could fail or continue to suffer from poor performance.
  
  [Other Info]
  
  SF: #00332721

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1971701

Title:
  Azure: swiotlb patch needed for CVM

Status in linux-azure package in Ubuntu:
  New

Bug description:
  SRU Justification

  [Impact]

  [Azure][CVM] Include the swiotlb patch to increase the disk/network
  performance

  Description
  As we discussed, there will be new CVM-supporting linux-azure kernels that're 
based on v5.13 and v5.15. Here I'm requesting the below patch to be included 
into the two kernels because it can significantly improve the disk/network 
performance:

  swiotlb: Split up single swiotlb lock:
  https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45

  We have tested the patch with the upstream 5.16-rc8.
  BTW, the patch is unlikely to be in the mainline kernel, as the community is 
trying to resolve the lock contention issue in the swiotlb code using a 
different per-device per-queue implementation, which would need quite some time 
to be finalized -- before that happens, we need this out-of-tree patch to 
achieve good disk/network performance for CVM GA on Azure.

  (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch,
  because it uses a private bounce buffer implementation:
  drivers/hv/hv_bounce.c, which doesn’t have the I/O performance issue
  caused by lock contention in the mainline kernel’s swiotlb code.)

  [Test Case]

  [Microsoft tested]

  I tried the April-27 amd64 test kernel and it worked great for me:
  1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB 
memory.
  2. The kernel worked when I changed the MTU of the NetVSC NIC.
  3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as 
expected.
  4. I did some quick disk I/O and network stress tests and found no issue.

  When I did the above tests, I changed the low MMIO size to 3GB (which
  is the setting for a VM on Azure today) by "set-vm decui-u2004-cvm
  -LowMemoryMappedIoSpace 3GB".

  Our test team will do more testing, including performance test. We
  expect the performance of this v5.15 test kernel should be on par with
  the v5.4 linux-azure-cvm kernel.

  [Where things could go wrong]

  Networking could fail or continue to suffer from poor performance.

  [Other Info]

  SF: #00332721

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1971701/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to