I am sorry about the delay in uploading the logs.

Update:
* I wanted to experiment on many things before I conclude this as a kernel 
defect. 

* A few firmware updates (including BIOS) were done.

* After these fw updates, I am unable to hit the same crash/hang issue.
Rather, I only see couple of my stress threads getting killed by oom-
killer and other threads exiting gracefully after 20hrs of I/O stress
run. This seems ok for me. I've tried 5 full runs now.

* # uname -r
4.13.0-32-generic
This is the same kernel where the hang was previously seen during the 20hr 
stress.

Give me a couple of days to get back here and update if this looks to be
a genuine Xenial defect.

Thanks,
Sujith

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1749746

Title:
  DellEMC AMD servers hang when running IO stress on NVMe disks

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Description:
  On Ubuntu 16.04 running 4.13.0-32 kernel, when file IO stress is run on 
multiple NVMe disks (ext4 partitioned), system hangs with multiple kernel 
crashes in the logs.

  Steps:
  1. Setup a DellEMC AMD servers with a few NVMe disks.
  2. Run the file IO stress on these disks for 24 hours.
  3. Observe that the system goes un-reponsive after a few mins/hrs.

  Additional Info:
  * Stress ran fine for 24hrs with 4.12.0-041200-generic.

  * Stress ran fine for 13hrs with 4.10.0-28-generic. Had to stop the
  run manually due to some other reasons.

  * Stress fails with linux-image-4.13.0-25-generic.

  Attaching the logs.
  Will update here once we have more data.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1749746/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to