Some updates here: the patch was released in the -proposed pocket, and is available in the kernel 4.4.0-1075-aws - to enable the proposed repository please see this https://wiki.ubuntu.com/Testing/EnableProposed. The plan is to have this kernel released in the first week of February, after all tests/validations finish in the proposed package.
With this kernel, if the timeouts occur the driver will poll the completion queue to be sure the io "timeouting" isn't completed, and our tests showed that for this bug, the io is there, which seems to indicate a missed interrupt. So, a kernel with the patch will mitigate the effects of the timeouts, not leading to the aborts anymore. The following message will be observed in dmesg: [39630.417191] nvme 0000:00:04.0: I/O 0 QID 2 timeout, completion polled Thanks, Guilherme -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1788035 Title: nvme: avoid cqe corruption Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Released Bug description: To address customer-reported NVMe issue with instance types (notably c5 and m5) that expose EBS volumes as NVMe devices, this commit from mainline v4.6 should be backported to Xenial: d783e0bd02e700e7a893ef4fa71c69438ac1c276 nvme: avoid cqe corruption when update at the same time as read dmesg sample: [Wed Aug 15 01:11:21 2018] nvme 0000:00:1f.0: I/O 8 QID 1 timeout, aborting [Wed Aug 15 01:11:21 2018] nvme 0000:00:1f.0: I/O 9 QID 1 timeout, aborting [Wed Aug 15 01:11:21 2018] nvme 0000:00:1f.0: I/O 21 QID 2 timeout, aborting [Wed Aug 15 01:11:32 2018] nvme 0000:00:1f.0: I/O 10 QID 1 timeout, aborting [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: I/O 8 QID 1 timeout, reset controller [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 21 QID 2 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with status: 0007 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 83887751 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 83887751 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 22 QID 2 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 83887767 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 83887767 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 23 QID 2 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 83887769 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 83887769 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 8 QID 1 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 9 QID 1 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with status: 0007 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 41943136 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 10 QID 1 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with status: 0007 [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 6976 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 22 QID 1 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 23 QID 1 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 24 QID 1 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 25 QID 1 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: Cancelling I/O 2 QID 0 [Wed Aug 15 01:11:51 2018] nvme nvme1: Abort status: 0x7 [Wed Aug 15 01:11:51 2018] nvme 0000:00:1f.0: completing aborted command with status: fffffffc [Wed Aug 15 01:11:51 2018] blk_update_request: I/O error, dev nvme1n1, sector 96 [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): metadata I/O error: block 0x5000687 ("xlog_iodone") error 5 numblks 64 [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_do_force_shutdown(0x2) called from line 1197 of file /build/linux-c2Z51P/linux-4.4.0/fs/xfs/xfs_log.c. Return address = 0xffffffffc075d428 [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_log_force: error -5 returned. [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): Log I/O Error Detected. Shutting down filesystem [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): Please umount the filesystem and rectify the problem(s) [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 872, lost async page write [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_iunlink_remove: xfs_imap_to_bp returned error -5. [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 873, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 874, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 875, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 876, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 877, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 878, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 879, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 880, lost async page write [Wed Aug 15 01:11:51 2018] Buffer I/O error on dev nvme1n1, logical block 881, lost async page write [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): metadata I/O error: block 0x5000697 ("xlog_iodone") error 5 numblks 64 [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_log_force: error -5 returned. [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_do_force_shutdown(0x2) called from line 1197 of file /build/linux-c2Z51P/linux-4.4.0/fs/xfs/xfs_log.c. Return address = 0xffffffffc075d428 [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): metadata I/O error: block 0x5000699 ("xlog_iodone") error 5 numblks 64 [Wed Aug 15 01:11:51 2018] XFS (nvme1n1): xfs_do_force_shutdown(0x2) called from line 1197 of file /build/linux-c2Z51P/linux-4.4.0/fs/xfs/xfs_log.c. Return address = 0xffffffffc075d428 [Wed Aug 15 01:12:20 2018] XFS (nvme1n1): xfs_log_force: error -5 returned. [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 22 QID 1 timeout, aborting [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 23 QID 1 timeout, aborting [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 24 QID 1 timeout, aborting [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 25 QID 1 timeout, aborting [Wed Aug 15 01:12:22 2018] nvme 0000:00:1f.0: I/O 24 QID 2 timeout, aborting [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:12:22 2018] nvme nvme1: Abort status: 0x2 [Wed Aug 15 01:12:50 2018] XFS (nvme1n1): xfs_log_force: error -5 returned. [Wed Aug 15 01:12:52 2018] nvme 0000:00:1f.0: I/O 22 QID 1 timeout, reset controller [Wed Aug 15 01:13:21 2018] XFS (nvme1n1): xfs_log_force: error -5 returned. [Wed Aug 15 01:13:51 2018] XFS (nvme1n1): xfs_log_force: error -5 returned. [Wed Aug 15 01:14:21 2018] XFS (nvme1n1): xfs_log_force: error -5 returned. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788035/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp