This bug is awaiting verification that the linux-nvidia-
tegra/6.8.0-1007.7 kernel in -proposed solves the problem. Please test
the kernel and update this bug with the results. If the problem is
solved, change the tag 'verification-needed-noble-linux-nvidia-tegra' to
'verification-done-noble-linux-nvidia-tegra'. If the problem still
exists, change the tag 'verification-needed-noble-linux-nvidia-tegra' to
'verification-failed-noble-linux-nvidia-tegra'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-noble-linux-nvidia-tegra-v2 
verification-needed-noble-linux-nvidia-tegra

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2106381

Title:
  nvme/tcp hangs IO on arm

Status in linux package in Ubuntu:
  New
Status in linux source package in Noble:
  In Progress
Status in linux source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
  In weakly ordered architectures the compiler can reorder the instructions 
reading/setting
  queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.

  The bug has been fixed upstream in [1], introduced in 6.14.

  [Test Plan]
  The bug is reproducible on arm64 architectures.
  Setup nvme over tcp.

  Using an arm based machien as the target run a fio test with the
  following config:

  [global]
  ioengine=libaio
  max_latency=45s
  end_fsync=1
  create_serialize=0
  size=3200m
  directory=/path/to/storage
  ramp_time=30
  lat_percentiles=1
  direct=1
  filename_format=fiodata.$jobnum
  verify_dump=1
  numjobs=16
  fallocate=native
  stonewall=1
  group_reporting=1
  file_service_type=random
  iodepth=16
  runtime=5m
  time_based=1
  [random_0_100_4k]
  bs=4k
  rw=randwrite

  [Where problems could occur]
  To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE
  statements and queue->rcv_state with smp_load_acquire and smp_store_release.
  The patch modifies the nvme-tcp driver and therefore any potential regressions
  regard setups using nvme over tpc.

  [Other Info]

  The user is able to reproduce the issue with kernles 5.19(no longer
  supported), 6.8 and 6.11.

  For 6.11 Oracular the patch is pulled in as part of upstream stable
  patchset 2025-04-15, LP #2107437,
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2107437

  [1]
  
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2106381/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to