Public bug reported:

[Impact]

A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
In weakly ordered architectures the compiler can reorder the instructions 
reading/setting
queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.

The bug has been fixed upstream in [1], introduced in 6.14.

[Test Plan]
The bug is reproducible on arm64 architectures.
Setup nvme over tcp. 

Using an arm based machien as the target run a fio test with the
following config:

[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/path/to/storage
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite

[Where problems could occur]
To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE 
statements and queue->rcv_state with smp_load_acquire and smp_store_release.
The patch modifies the nvme-tcp driver and therefore any potential regressions
regard setups using nvme over tpc.


[Other Info]

The user is able to reproduce the issue with kernles 5.19(no longer
supported), 6.8 and 6.11.


[1] 
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Noble)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Oracular)
     Importance: Undecided
         Status: New

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Oracular)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2106381

Title:
  nvme/tcp hangs IO on arm

Status in linux package in Ubuntu:
  New
Status in linux source package in Noble:
  New
Status in linux source package in Oracular:
  New

Bug description:
  [Impact]

  A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
  In weakly ordered architectures the compiler can reorder the instructions 
reading/setting
  queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.

  The bug has been fixed upstream in [1], introduced in 6.14.

  [Test Plan]
  The bug is reproducible on arm64 architectures.
  Setup nvme over tcp. 

  Using an arm based machien as the target run a fio test with the
  following config:

  [global]
  ioengine=libaio
  max_latency=45s
  end_fsync=1
  create_serialize=0
  size=3200m
  directory=/path/to/storage
  ramp_time=30
  lat_percentiles=1
  direct=1
  filename_format=fiodata.$jobnum
  verify_dump=1
  numjobs=16
  fallocate=native
  stonewall=1
  group_reporting=1
  file_service_type=random
  iodepth=16
  runtime=5m
  time_based=1
  [random_0_100_4k]
  bs=4k
  rw=randwrite

  [Where problems could occur]
  To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE 
  statements and queue->rcv_state with smp_load_acquire and smp_store_release.
  The patch modifies the nvme-tcp driver and therefore any potential regressions
  regard setups using nvme over tpc.

  
  [Other Info]

  The user is able to reproduce the issue with kernles 5.19(no longer
  supported), 6.8 and 6.11.

  
  [1] 
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2106381/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to