[Bug 1847948] Re: Improve NVMe guest performance on Bionic QEMU

Christian Ehrhardt  Thu, 17 Oct 2019 01:11:52 -0700

Thanks Murilo, I need no test with the basic kernel yet.
But later when we really SRU this it will be good to do both a old and a HWE 
kernel check.
Let me add that to the verification steps ...


** Description changed:

  [Impact]
  
-  * In the past qemu has generally not allowd MSI-X BAR mapping on VFIO.
-    But there can be platforms (like ppc64 spapr) that can and want to do 
-    exactly that.
+  * In the past qemu has generally not allowd MSI-X BAR mapping on VFIO.
+    But there can be platforms (like ppc64 spapr) that can and want to do
+    exactly that.
  
-  * Backport two patches from upstream (in since qemu 2.12 / Disco).
+  * Backport two patches from upstream (in since qemu 2.12 / Disco).
  
-  * Due to that there is a tremendous speedup, especially useful with page 
-    size bigger than 4k. This avoids that being split into chunks and makes 
-    direct MMIO access possible for the guest.
+  * Due to that there is a tremendous speedup, especially useful with page
+    size bigger than 4k. This avoids that being split into chunks and makes
+    direct MMIO access possible for the guest.
  
  [Test Case]
  
-  * On ppc64 pass through an NVME device to the guest and run I/O 
-    benchmarks, see below for Details how to set that up.
-    Note this needs the HWE kernel or another kernel fixup for [1].
+  * On ppc64 pass through an NVME device to the guest and run I/O
+    benchmarks, see below for Details how to set that up.
+    Note: this needs the HWE kernel or another kernel fixup for [1].
+    Note: the test should also be done with the non-HWE kernel, the 
+    expectation there is that it would not show the perf benefits, but 
+    still work fine
  
  [Regression Potential]
  
-  * Changes:
-    a) if the host driver allows mapping of MSI-X data the entire BAR is 
-       mapped. This is only done if the kernel reports that capability [1].
-       This ensures that only on kernels able to do so qemu does expose the 
-       new behavior (safe against regression in that regard)
-    b) on ppc64 MSI-X emulation is disabled for VFIO devices this is local 
-       to just this HW and will not affect other HW.
+  * Changes:
+    a) if the host driver allows mapping of MSI-X data the entire BAR is
+       mapped. This is only done if the kernel reports that capability [1].
+       This ensures that only on kernels able to do so qemu does expose the
+       new behavior (safe against regression in that regard)
+    b) on ppc64 MSI-X emulation is disabled for VFIO devices this is local
+       to just this HW and will not affect other HW.
  
-    Generally the regressions that come to mind are slight changes in 
-    behavior (real HW vs the former emulation) that on some weird/old 
-    guests could cause trouble. But then it is limited to only PPC where 
-    only a small set of certified HW is really allowed.
+    Generally the regressions that come to mind are slight changes in
+    behavior (real HW vs the former emulation) that on some weird/old
+    guests could cause trouble. But then it is limited to only PPC where
+    only a small set of certified HW is really allowed.
  
-    The mapping that might be added even on other platforms should not 
-    consume too much extra memory as long as it isn't used. Further since 
-    it depends on the kernel capability it isn't randomly issues on kernels 
-    where we expect it to fail.
+    The mapping that might be added even on other platforms should not
+    consume too much extra memory as long as it isn't used. Further since
+    it depends on the kernel capability it isn't randomly issues on kernels
+    where we expect it to fail.
  
-    So while it is quite a change, it seems safe to me.
+    So while it is quite a change, it seems safe to me.
  
  [Other Info]
-  
-  * I know, one could as well call that a "feature", but it really is a 
-    performance bug fix more than anything else. Also the SRU policy allows
-    exploitation/toleration of new HW especially for LTS releases. 
-    Therefore I think this is fine as SRU.
+ 
+  * I know, one could as well call that a "feature", but it really is a
+    performance bug fix more than anything else. Also the SRU policy allows
+    exploitation/toleration of new HW especially for LTS releases.
+    Therefore I think this is fine as SRU.
  
  [1]:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
- 
  
  == Comment: #0 - Murilo Opsfelder Araujo  - 2019-10-11 14:16:14 ==
  
  ---Problem Description---
  Back-port the following patches to Bionic QEMU to improve NVMe guest 
performance by more than 200%:
  
  ?vfio-pci: Allow mmap of MSIX BAR?
  
https://git.qemu.org/?p=qemu.git;a=commit;h=ae0215b2bb56a9d5321a185dde133bfdd306a4c0
  
  ?ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices?
  
https://git.qemu.org/?p=qemu.git;a=commit;h=fcad0d2121976df4b422b4007a5eb7fcaac01134
  
  ---uname output---
  na
  
  ---Additional Hardware Info---
  0030:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe 
SSD Controller 172Xa/172Xb (rev 01)
  
  Machine Type = AC922
  
  ---Debugger---
  A debugger is not configured
  
  ---Steps to Reproduce---
   Install or setup a guest image and boot it.
  
  Once guest is running, passthrough the NVMe disk to the guest using the
  XML:
  
  host$ cat nvme-disk.xml
  <hostdev mode='subsystem' type='pci' managed='no'>
     <driver name='vfio'/>
      <source>
          <address domain='0x0030' bus='0x01' slot='0x00' function='0x0'/>
      </source>
  </hostdev>
  
  host$ virsh attach-device <domain> nvme-disk.xml --live
  
  On the guest, run fio benchmarks:
  
  guest$ fio --direct=1 --rw=randrw --refill_buffers --norandommap
  --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=100 --iodepth=16
  --runtime=60 --name=job1 --filename=/dev/nvme0n1 --numjobs=4
  
  Results are similar with numjobs=4 and numjobs=64, respectively:
  
     READ: bw=385MiB/s (404MB/s), 78.0MiB/s-115MiB/s (81.8MB/s-120MB/s), 
io=11.3GiB (12.1GB), run=30001-30001msec
     READ: bw=382MiB/s (400MB/s), 2684KiB/s-12.6MiB/s (2749kB/s-13.2MB/s), 
io=11.2GiB (12.0GB), run=30001-30009msec
  
  With the two patches applied, performance improved significantly for
  numjobs=4 and numjobs=64 cases, respectively:
  
     READ: bw=1191MiB/s (1249MB/s), 285MiB/s-309MiB/s (299MB/s-324MB/s), 
io=34.9GiB (37.5GB), run=30001-30001msec
     READ: bw=4273MiB/s (4481MB/s), 49.7MiB/s-113MiB/s (52.1MB/s-119MB/s), 
io=125GiB (134GB), run=30001-30005msec
  
  Userspace tool common name: qemu
  
  Userspace rpm: qemu
  
  The userspace tool has the following bit modes: 64-bit
  
  Userspace tool obtained from project website:  na

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1847948

Title:
  Improve NVMe guest performance on Bionic QEMU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1847948/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1847948] Re: Improve NVMe guest performance on Bionic QEMU

Reply via email to