@Murilo: I was wondering - I think we might want/need also [1] to not
run into fatal false positives. What is your opinion on this?

[1]:
https://git.qemu.org/?p=qemu.git;a=commit;h=567b5b309abe744b1098018a2eb157e7109c9f30

** Description changed:

+ [Impact]
+ 
+  * In the past qemu has generally not allowd MSI-X BAR mapping on VFIO.
+    But there can be platforms (like ppc64 spapr) that can and want to do 
+    exactly that.
+ 
+  * Backport two patches from upstream (in since qemu 2.12 / Disco).
+ 
+  * Due to that there is a tremendous speedup, especially useful with page 
+    size bigger than 4k. This avoids that being split into chunks and makes 
+    direct MMIO access possible for the guest.
+ 
+ [Test Case]
+ 
+  * On ppc64 pass through an NVME device to the guest and run I/O 
+    benchmarks, see below for Details how to set that up.
+    Note this needs the HWE kernel or another kernel fixup for [1].
+ 
+ [Regression Potential]
+ 
+  * Changes:
+    a) if the host driver allows mapping of MSI-X data the entire BAR is 
+       mapped. This is only done if the kernel reports that capability [1].
+       This ensures that only on kernels able to do so qemu does expose the 
+       new behavior (safe against regression in that regard)
+    b) on ppc64 MSI-X emulation is disabled for VFIO devices this is local 
+       to just this HW and will not affect other HW.
+ 
+    Generally the regressions that come to mind are slight changes in 
+    behavior (real HW vs the former emulation) that on some weird/old 
+    guests could cause trouble. But then it is limited to only PPC where 
+    only a small set of certified HW is really allowed.
+ 
+    The mapping that might be added even on other platforms should not 
+    consume too much extra memory as long as it isn't used. Further since 
+    it depends on the kernel capability it isn't randomly issues on kernels 
+    where we expect it to fail.
+ 
+    So while it is quite a change, it seems safe to me.
+ 
+ [Other Info]
+  
+  * I know, one could as well call that a "feature", but it really is a 
+    performance bug fix more than anything else. Also the SRU policy allows
+    exploitation/toleration of new HW especially for LTS releases. 
+    Therefore I think this is fine as SRU.
+ 
+ [1]:
+ 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
+ 
+ 
  == Comment: #0 - Murilo Opsfelder Araujo  - 2019-10-11 14:16:14 ==
  
  ---Problem Description---
  Back-port the following patches to Bionic QEMU to improve NVMe guest 
performance by more than 200%:
  
  ?vfio-pci: Allow mmap of MSIX BAR?
  
https://git.qemu.org/?p=qemu.git;a=commit;h=ae0215b2bb56a9d5321a185dde133bfdd306a4c0
  
  ?ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices?
  
https://git.qemu.org/?p=qemu.git;a=commit;h=fcad0d2121976df4b422b4007a5eb7fcaac01134
-  
+ 
  ---uname output---
  na
-  
+ 
  ---Additional Hardware Info---
- 0030:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe 
SSD Controller 172Xa/172Xb (rev 01) 
+ 0030:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe 
SSD Controller 172Xa/172Xb (rev 01)
  
-  
- Machine Type = AC922 
-  
+ Machine Type = AC922
+ 
  ---Debugger---
  A debugger is not configured
-  
+ 
  ---Steps to Reproduce---
-  Install or setup a guest image and boot it.
+  Install or setup a guest image and boot it.
  
  Once guest is running, passthrough the NVMe disk to the guest using the
  XML:
  
  host$ cat nvme-disk.xml
  <hostdev mode='subsystem' type='pci' managed='no'>
-    <driver name='vfio'/>
-     <source>
-         <address domain='0x0030' bus='0x01' slot='0x00' function='0x0'/>
-     </source>
+    <driver name='vfio'/>
+     <source>
+         <address domain='0x0030' bus='0x01' slot='0x00' function='0x0'/>
+     </source>
  </hostdev>
  
  host$ virsh attach-device <domain> nvme-disk.xml --live
  
  On the guest, run fio benchmarks:
  
  guest$ fio --direct=1 --rw=randrw --refill_buffers --norandommap
  --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=100 --iodepth=16
  --runtime=60 --name=job1 --filename=/dev/nvme0n1 --numjobs=4
  
  Results are similar with numjobs=4 and numjobs=64, respectively:
  
-    READ: bw=385MiB/s (404MB/s), 78.0MiB/s-115MiB/s (81.8MB/s-120MB/s), 
io=11.3GiB (12.1GB), run=30001-30001msec
-    READ: bw=382MiB/s (400MB/s), 2684KiB/s-12.6MiB/s (2749kB/s-13.2MB/s), 
io=11.2GiB (12.0GB), run=30001-30009msec
+    READ: bw=385MiB/s (404MB/s), 78.0MiB/s-115MiB/s (81.8MB/s-120MB/s), 
io=11.3GiB (12.1GB), run=30001-30001msec
+    READ: bw=382MiB/s (400MB/s), 2684KiB/s-12.6MiB/s (2749kB/s-13.2MB/s), 
io=11.2GiB (12.0GB), run=30001-30009msec
  
  With the two patches applied, performance improved significantly for
  numjobs=4 and numjobs=64 cases, respectively:
  
-    READ: bw=1191MiB/s (1249MB/s), 285MiB/s-309MiB/s (299MB/s-324MB/s), 
io=34.9GiB (37.5GB), run=30001-30001msec
-    READ: bw=4273MiB/s (4481MB/s), 49.7MiB/s-113MiB/s (52.1MB/s-119MB/s), 
io=125GiB (134GB), run=30001-30005msec
+    READ: bw=1191MiB/s (1249MB/s), 285MiB/s-309MiB/s (299MB/s-324MB/s), 
io=34.9GiB (37.5GB), run=30001-30001msec
+    READ: bw=4273MiB/s (4481MB/s), 49.7MiB/s-113MiB/s (52.1MB/s-119MB/s), 
io=125GiB (134GB), run=30001-30005msec
  
-  
- Userspace tool common name: qemu 
+ Userspace tool common name: qemu
  
- Userspace rpm: qemu 
-  
- The userspace tool has the following bit modes: 64-bit 
+ Userspace rpm: qemu
+ 
+ The userspace tool has the following bit modes: 64-bit
  
  Userspace tool obtained from project website:  na

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1847948

Title:
  Improve NVMe guest performance on Bionic QEMU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1847948/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to