Public bug reported: [Impact]
A user reported a bug in nvme over tcp driver affecting aarch64 architectures. In weakly ordered architectures the compiler can reorder the instructions reading/setting queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging. The bug has been fixed upstream in [1], introduced in 6.14. [Test Plan] The bug is reproducible on arm64 architectures. Setup nvme over tcp. Using an arm based machien as the target run a fio test with the following config: [global] ioengine=libaio max_latency=45s end_fsync=1 create_serialize=0 size=3200m directory=/path/to/storage ramp_time=30 lat_percentiles=1 direct=1 filename_format=fiodata.$jobnum verify_dump=1 numjobs=16 fallocate=native stonewall=1 group_reporting=1 file_service_type=random iodepth=16 runtime=5m time_based=1 [random_0_100_4k] bs=4k rw=randwrite [Where problems could occur] To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE statements and queue->rcv_state with smp_load_acquire and smp_store_release. The patch modifies the nvme-tcp driver and therefore any potential regressions regard setups using nvme over tpc. [Other Info] The user is able to reproduce the issue with kernles 5.19(no longer supported), 6.8 and 6.11. [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352 ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Affects: linux (Ubuntu Noble) Importance: Undecided Status: New ** Affects: linux (Ubuntu Oracular) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Noble) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Oracular) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2106381 Title: nvme/tcp hangs IO on arm Status in linux package in Ubuntu: New Status in linux source package in Noble: New Status in linux source package in Oracular: New Bug description: [Impact] A user reported a bug in nvme over tcp driver affecting aarch64 architectures. In weakly ordered architectures the compiler can reorder the instructions reading/setting queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging. The bug has been fixed upstream in [1], introduced in 6.14. [Test Plan] The bug is reproducible on arm64 architectures. Setup nvme over tcp. Using an arm based machien as the target run a fio test with the following config: [global] ioengine=libaio max_latency=45s end_fsync=1 create_serialize=0 size=3200m directory=/path/to/storage ramp_time=30 lat_percentiles=1 direct=1 filename_format=fiodata.$jobnum verify_dump=1 numjobs=16 fallocate=native stonewall=1 group_reporting=1 file_service_type=random iodepth=16 runtime=5m time_based=1 [random_0_100_4k] bs=4k rw=randwrite [Where problems could occur] To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE statements and queue->rcv_state with smp_load_acquire and smp_store_release. The patch modifies the nvme-tcp driver and therefore any potential regressions regard setups using nvme over tpc. [Other Info] The user is able to reproduce the issue with kernles 5.19(no longer supported), 6.8 and 6.11. [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2106381/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp