** Description changed:
- A performance regression has been reported when running fio against two
- NVMe devices under the same pci bridge (dual port NVMe).
+ [Description]
+ A performance regression has been reported when running fio against two NVMe
devices under the same pci bridge (dual port NVMe).
+ The issue was initially reported for 6.11-hwe kernel for Noble.
+ The performance regression was introduced in the 6.10 upstream kernel and is
still present in 6.16 (build at commit e540341508ce2f6e27810106253d5).
+ Bisection pointed to commit 129dab6e1286 ("iommu/vt-d: Use
cache_tag_flush_range_np() in iotlb_sync_map").
- The performance regression was introduced in the 6.10 upstream kernel
- and is still present in 6.16.
+ In our tests we observe ~6150 MiB/s when the NVMe devices are on
+ different bridges and ~4985 MiB/s when under the same brigde.
+
+ Before the offending commit we observe ~6150 MiB/s, regardless of NVMe
+ device placement.
+
+
+ [Test Case]
+
+ We can reproduce the issue on gcp on Z3 metal instance type
+ (z3-highmem-192-highlssd-metal) [1].
+
+ You need to have 2 NVMe devices under the same bridge, e.g:
+
+ # nvme list -v
+ ...
+ Device SN MN FR
TxPort Address Slot Subsystem Namespaces
+ -------- -------------------- ----------------------------------------
-------- ------ -------------- ------ ------------ ----------------
+ nvme0 nvme_card-pd nvme_card-pd (null)
pcie 0000:05:00.1 nvme-subsys0 nvme0n1
+ nvme1 3DE4D285C21A7C001.0 nvme_card
00000000 pcie 0000:3d:00.0 nvme-subsys1 nvme1n1
+ nvme10 3DE4D285C21A7C001.1 nvme_card
00000000 pcie 0000:3d:00.1 nvme-subsys10 nvme10n1
+ nvme11 3DE4D285C2027C000.0 nvme_card
00000000 pcie 0000:3e:00.0 nvme-subsys11 nvme11n1
+ nvme12 3DE4D285C2027C000.1 nvme_card
00000000 pcie 0000:3e:00.1 nvme-subsys12 nvme12n1
+ nvme2 3DE4D285C2368C001.0 nvme_card
00000000 pcie 0000:b7:00.0 nvme-subsys2 nvme2n1
+ nvme3 3DE4D285C22A74001.0 nvme_card
00000000 pcie 0000:86:00.0 nvme-subsys3 nvme3n1
+ nvme4 3DE4D285C22A74001.1 nvme_card
00000000 pcie 0000:86:00.1 nvme-subsys4 nvme4n1
+ nvme5 3DE4D285C2368C001.1 nvme_card
00000000 pcie 0000:b7:00.1 nvme-subsys5 nvme5n1
+ nvme6 3DE4D285C21274000.0 nvme_card
00000000 pcie 0000:87:00.0 nvme-subsys6 nvme6n1
+ nvme7 3DE4D285C21094000.0 nvme_card
00000000 pcie 0000:b8:00.0 nvme-subsys7 nvme7n1
+ nvme8 3DE4D285C21274000.1 nvme_card
00000000 pcie 0000:87:00.1 nvme-subsys8 nvme8n1
+ nvme9 3DE4D285C21094000.1 nvme_card
00000000 pcie 0000:b8:00.1 nvme-subsys9 nvme9n1
+
+ ...
+
+ For the output above, drives nvme1n1 and nvme10n1 are under the same
+ bridge, and looking the SN it seems it is a dual port NVMe.
+
+ - Under the same bridge
+ Run fio against nvme1n1 and nvme10n1, observe 4897MiB/s after a short spike
in the beginning at ~6150MiB/s.
+
+ # sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
--time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
--new_group --name=job1 --filename=/dev/nvme1n1 --new_group --name=job2
--filename=/dev/nvme10n1
+ ...
+ Jobs: 16 (f=16): [r(16)][100.0%][r=4897MiB/s][r=1254k IOPS][eta 00m:00s]
+ ...
+
+ - Under different bridge
+ Run fio against nvme1n1 and nvme11n1, observe
+
+ # sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
--time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
--new_group --name=job1 --filename=/dev/nvme1n1 --new_group --name=job2
--filename=/dev/nvme11n1
+ ...
+ Jobs: 16 (f=16): [r(16)][100.0%][r=6153MiB/s][r=1575k IOPS][eta 00m:00s]
+ ...
+
+ ** So far, we haven't been able to reproduce it on another machine, but
+ we suspect will be reproducible with any machine with a dual port NVMe.
+
+ [Other]
+
+ [1] https://cloud.google.com/compute/docs/storage-optimized-
+ machines#z3_machine_types
** Description changed:
[Description]
A performance regression has been reported when running fio against two NVMe
devices under the same pci bridge (dual port NVMe).
The issue was initially reported for 6.11-hwe kernel for Noble.
The performance regression was introduced in the 6.10 upstream kernel and is
still present in 6.16 (build at commit e540341508ce2f6e27810106253d5).
Bisection pointed to commit 129dab6e1286 ("iommu/vt-d: Use
cache_tag_flush_range_np() in iotlb_sync_map").
In our tests we observe ~6150 MiB/s when the NVMe devices are on
different bridges and ~4985 MiB/s when under the same brigde.
Before the offending commit we observe ~6150 MiB/s, regardless of NVMe
device placement.
-
[Test Case]
We can reproduce the issue on gcp on Z3 metal instance type
(z3-highmem-192-highlssd-metal) [1].
You need to have 2 NVMe devices under the same bridge, e.g:
# nvme list -v
...
- Device SN MN FR
TxPort Address Slot Subsystem Namespaces
+ Device SN MN FR
TxPort Address Slot Subsystem Namespaces
-------- -------------------- ----------------------------------------
-------- ------ -------------- ------ ------------ ----------------
nvme0 nvme_card-pd nvme_card-pd (null)
pcie 0000:05:00.1 nvme-subsys0 nvme0n1
nvme1 3DE4D285C21A7C001.0 nvme_card
00000000 pcie 0000:3d:00.0 nvme-subsys1 nvme1n1
nvme10 3DE4D285C21A7C001.1 nvme_card
00000000 pcie 0000:3d:00.1 nvme-subsys10 nvme10n1
nvme11 3DE4D285C2027C000.0 nvme_card
00000000 pcie 0000:3e:00.0 nvme-subsys11 nvme11n1
nvme12 3DE4D285C2027C000.1 nvme_card
00000000 pcie 0000:3e:00.1 nvme-subsys12 nvme12n1
nvme2 3DE4D285C2368C001.0 nvme_card
00000000 pcie 0000:b7:00.0 nvme-subsys2 nvme2n1
nvme3 3DE4D285C22A74001.0 nvme_card
00000000 pcie 0000:86:00.0 nvme-subsys3 nvme3n1
nvme4 3DE4D285C22A74001.1 nvme_card
00000000 pcie 0000:86:00.1 nvme-subsys4 nvme4n1
nvme5 3DE4D285C2368C001.1 nvme_card
00000000 pcie 0000:b7:00.1 nvme-subsys5 nvme5n1
nvme6 3DE4D285C21274000.0 nvme_card
00000000 pcie 0000:87:00.0 nvme-subsys6 nvme6n1
nvme7 3DE4D285C21094000.0 nvme_card
00000000 pcie 0000:b8:00.0 nvme-subsys7 nvme7n1
nvme8 3DE4D285C21274000.1 nvme_card
00000000 pcie 0000:87:00.1 nvme-subsys8 nvme8n1
nvme9 3DE4D285C21094000.1 nvme_card
00000000 pcie 0000:b8:00.1 nvme-subsys9 nvme9n1
...
For the output above, drives nvme1n1 and nvme10n1 are under the same
bridge, and looking the SN it seems it is a dual port NVMe.
- Under the same bridge
Run fio against nvme1n1 and nvme10n1, observe 4897MiB/s after a short spike
in the beginning at ~6150MiB/s.
# sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
--time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
--new_group --name=job1 --filename=/dev/nvme1n1 --new_group --name=job2
--filename=/dev/nvme10n1
...
Jobs: 16 (f=16): [r(16)][100.0%][r=4897MiB/s][r=1254k IOPS][eta 00m:00s]
...
- Under different bridge
- Run fio against nvme1n1 and nvme11n1, observe
+ Run fio against nvme1n1 and nvme11n1, observe
# sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
--time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
--new_group --name=job1 --filename=/dev/nvme1n1 --new_group --name=job2
--filename=/dev/nvme11n1
...
Jobs: 16 (f=16): [r(16)][100.0%][r=6153MiB/s][r=1575k IOPS][eta 00m:00s]
...
** So far, we haven't been able to reproduce it on another machine, but
we suspect will be reproducible with any machine with a dual port NVMe.
[Other]
+ Offending commit :
+
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=129dab6e1286525fe5baed860d3dfcd9c6b4b327
+
[1] https://cloud.google.com/compute/docs/storage-optimized-
machines#z3_machine_types
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2115738
Title:
I/O performance regression on NVMes under same bridge (dual port nvme)
Status in linux package in Ubuntu:
New
Status in linux source package in Oracular:
New
Status in linux source package in Plucky:
New
Status in linux source package in Questing:
New
Bug description:
[Description]
A performance regression has been reported when running fio against two NVMe
devices under the same pci bridge (dual port NVMe).
The issue was initially reported for 6.11-hwe kernel for Noble.
The performance regression was introduced in the 6.10 upstream kernel and is
still present in 6.16 (build at commit e540341508ce2f6e27810106253d5).
Bisection pointed to commit 129dab6e1286 ("iommu/vt-d: Use
cache_tag_flush_range_np() in iotlb_sync_map").
In our tests we observe ~6150 MiB/s when the NVMe devices are on
different bridges and ~4985 MiB/s when under the same brigde.
Before the offending commit we observe ~6150 MiB/s, regardless of NVMe
device placement.
[Test Case]
We can reproduce the issue on gcp on Z3 metal instance type
(z3-highmem-192-highlssd-metal) [1].
You need to have 2 NVMe devices under the same bridge, e.g:
# nvme list -v
...
Device SN MN FR
TxPort Address Slot Subsystem Namespaces
-------- -------------------- ----------------------------------------
-------- ------ -------------- ------ ------------ ----------------
nvme0 nvme_card-pd nvme_card-pd (null)
pcie 0000:05:00.1 nvme-subsys0 nvme0n1
nvme1 3DE4D285C21A7C001.0 nvme_card
00000000 pcie 0000:3d:00.0 nvme-subsys1 nvme1n1
nvme10 3DE4D285C21A7C001.1 nvme_card
00000000 pcie 0000:3d:00.1 nvme-subsys10 nvme10n1
nvme11 3DE4D285C2027C000.0 nvme_card
00000000 pcie 0000:3e:00.0 nvme-subsys11 nvme11n1
nvme12 3DE4D285C2027C000.1 nvme_card
00000000 pcie 0000:3e:00.1 nvme-subsys12 nvme12n1
nvme2 3DE4D285C2368C001.0 nvme_card
00000000 pcie 0000:b7:00.0 nvme-subsys2 nvme2n1
nvme3 3DE4D285C22A74001.0 nvme_card
00000000 pcie 0000:86:00.0 nvme-subsys3 nvme3n1
nvme4 3DE4D285C22A74001.1 nvme_card
00000000 pcie 0000:86:00.1 nvme-subsys4 nvme4n1
nvme5 3DE4D285C2368C001.1 nvme_card
00000000 pcie 0000:b7:00.1 nvme-subsys5 nvme5n1
nvme6 3DE4D285C21274000.0 nvme_card
00000000 pcie 0000:87:00.0 nvme-subsys6 nvme6n1
nvme7 3DE4D285C21094000.0 nvme_card
00000000 pcie 0000:b8:00.0 nvme-subsys7 nvme7n1
nvme8 3DE4D285C21274000.1 nvme_card
00000000 pcie 0000:87:00.1 nvme-subsys8 nvme8n1
nvme9 3DE4D285C21094000.1 nvme_card
00000000 pcie 0000:b8:00.1 nvme-subsys9 nvme9n1
...
For the output above, drives nvme1n1 and nvme10n1 are under the same
bridge, and looking the SN it seems it is a dual port NVMe.
- Under the same bridge
Run fio against nvme1n1 and nvme10n1, observe 4897MiB/s after a short spike
in the beginning at ~6150MiB/s.
# sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
--time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
--new_group --name=job1 --filename=/dev/nvme1n1 --new_group --name=job2
--filename=/dev/nvme10n1
...
Jobs: 16 (f=16): [r(16)][100.0%][r=4897MiB/s][r=1254k IOPS][eta 00m:00s]
...
- Under different bridge
Run fio against nvme1n1 and nvme11n1, observe
# sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
--time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
--new_group --name=job1 --filename=/dev/nvme1n1 --new_group --name=job2
--filename=/dev/nvme11n1
...
Jobs: 16 (f=16): [r(16)][100.0%][r=6153MiB/s][r=1575k IOPS][eta 00m:00s]
...
** So far, we haven't been able to reproduce it on another machine,
but we suspect will be reproducible with any machine with a dual port
NVMe.
[Other]
Offending commit :
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=129dab6e1286525fe5baed860d3dfcd9c6b4b327
[1] https://cloud.google.com/compute/docs/storage-optimized-
machines#z3_machine_types
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2115738/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp