We were reproducing this multiple times a day on multiple of our EC2 M5
instances. Interesting anecdote, our least loaded instances produced the
bug more often than our heavily loaded instances.

We've since switched to M4 servers and do not have time to flip back and
help test this right now.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1788035

Title:
  nvme: avoid cqe corruption

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Released

Bug description:
  To address customer-reported NVMe issue with instance types (notably
  c5 and m5) that expose EBS volumes as NVMe devices, this commit from
  mainline v4.6 should be backported to Xenial:

  d783e0bd02e700e7a893ef4fa71c69438ac1c276 nvme: avoid cqe corruption
  when update at the same time as read

  dmesg sample:

  [Wed Aug 15 01:11:21 2018] nvme 0000:00:1f.0: I/O 8 QID 1 timeout, aborting
  [Wed Aug 15 01:11:21 2018] nvme 0000:00:1f.0: I/O 9 QID 1 timeout, aborting
  [Wed Aug 15 01:11:21 2018] nvme 0000:00:1f.0: I/O 21 QID 2 timeout, aborting
  [Wed Aug 15 01:11:32 2018] nvme 0000:00:1f.0: I/O 10 QID 1 timeout, aborting
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: I/O 8 QID 1 timeout, reset 
controller
  [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 21 QID 2
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with 
status: 0007
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
83887751
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
83887751
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 22 QID 2
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
83887767
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
83887767
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 23 QID 2
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
83887769
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
83887769
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 8 QID 1
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 9 QID 1
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with 
status: 0007
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
41943136
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 10 QID 1
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with 
status: 0007
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
6976
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 22 QID 1
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 23 QID 1
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 24 QID 1
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 25 QID 1
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 2 QID 0
  [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x7
  [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with 
status: fffffffc
  [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 
96
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): metadata I/O error: block 0x5000687 
("xlog_iodone") error 5 numblks 64
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_do_force_shutdown(0x2) called 
from line 1197 of file /build/linux-c2Z51P/linux-4.4.0/fs/xfs/xfs_log.c. Return 
address = 0xffffffffc075d428
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_log_force: error -5 returned.
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): Log I/O Error Detected. Shutting 
down filesystem
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): Please umount the filesystem and 
rectify the problem(s)
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
872, lost async page write
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_imap_to_bp: 
xfs_trans_read_buf() returned error -5.
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_iunlink_remove: xfs_imap_to_bp 
returned error -5.
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
873, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
874, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
875, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
876, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
877, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
878, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
879, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
880, lost async page write
  [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 
881, lost async page write
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): metadata I/O error: block 0x5000697 
("xlog_iodone") error 5 numblks 64
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_log_force: error -5 returned.
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_do_force_shutdown(0x2) called 
from line 1197 of file /build/linux-c2Z51P/linux-4.4.0/fs/xfs/xfs_log.c. Return 
address = 0xffffffffc075d428
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): metadata I/O error: block 0x5000699 
("xlog_iodone") error 5 numblks 64
  [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_do_force_shutdown(0x2) called 
from line 1197 of file /build/linux-c2Z51P/linux-4.4.0/fs/xfs/xfs_log.c. Return 
address = 0xffffffffc075d428
  [Wed Aug 15 01:12:20 2018] XFS (nvme1n1): xfs_log_force: error -5 returned.
  [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 22 QID 1 timeout, aborting
  [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 23 QID 1 timeout, aborting
  [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 24 QID 1 timeout, aborting
  [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 25 QID 1 timeout, aborting
  [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 24 QID 2 timeout, aborting
  [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2
  [Wed Aug 15 01:12:50 2018] XFS (nvme1n1): xfs_log_force: error -5 returned.
  [Wed Aug 15 01:12:52 2018] nvme 0000:00:1f.0: I/O 22 QID 1 timeout, reset 
controller
  [Wed Aug 15 01:13:21 2018] XFS (nvme1n1): xfs_log_force: error -5 returned.
  [Wed Aug 15 01:13:51 2018] XFS (nvme1n1): xfs_log_force: error -5 returned.
  [Wed Aug 15 01:14:21 2018] XFS (nvme1n1): xfs_log_force: error -5 returned.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788035/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to