On an i3 instance in east-1, where i can reproduce fairly easily, the errors i'm getting unfortunately don't help. the nvme controller is failing some requests, but it isn't providing any useful info about why it doesn't like the requests. for example, here is some debug I added:
[ 1464.634709] nvme nvme0: invalid field command_id 3eb qid 5 cmd_type 1 cmd_flags 4001 data_dir 1 status 2002 the controller is failing a request with error 2 "invalid field", and sets the "more error data" flag 0x2000. So I pulled the error log page, which is supposed to provide more data about why the request failed. [ 1464.634836] nvme nvme0: error log entry: count 5d5281a qid 5 command_id 3eb status 2002 byte ff bit ff lba 0 ns 1 vendor 0 csi 0 the nvme controller error log is supposed to provide details about the failure, but this provides no new info; the qid and command_id match the failure above, but the error location fields (byte and bit) which are supposed to point to the specific byte/bit in the request that the controller doesn't like, are set to 0xffff which means "If the error is not specific to a particular command then this field shall be set to FFFFh" - so that's totally unhelpful. Instead of trying to determine what the controller's unhappy about, I'll try bisecting with an older kernel, to find the commit that introduces the failure. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1668129 Title: Amazon I3 Instance Buffer I/O error on dev nvme0n1 Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Triaged Bug description: On the AWS i3 instance class - when putting the new NVME storage disks under high IO load - seeing data corruption and errors in dmesg [ 662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912 [ 662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost async page write [ 662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost async page write [ 662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost async page write [ 662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost async page write [ 662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost async page write [ 662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost async page write [ 662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost async page write [ 662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost async page write [ 662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost async page write [ 662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost async page write [ 663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912 <snip> [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744 [ 1012.755396] buffer_io_error: 194552 callbacks suppressed [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async page write [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async page write [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async page write [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async page write [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async page write Able to replicate this with a bonnie++ stress test. bonnie++ -d /mnt/test/ -r 1000 Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Feb 27 02:12 seq crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A DistroRelease: Ubuntu 16.04 Ec2AMI: ami-bc62b2aa Ec2AMIManifest: (unknown) Ec2AvailabilityZone: us-east-1d Ec2InstanceType: i3.2xlarge Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory JournalErrors: Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] failed with exit code 1: Hint: You are currently not seeing messages from other users and the system. Users in the 'systemd-journal' group can see all messages. Pass -q to turn off this notice. No journal files were opened due to insufficient permissions. Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0 ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44 RelatedPackageVersions: linux-restricted-modules-4.4.0-64-generic N/A linux-backports-modules-4.4.0-64-generic N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory Tags: xenial ec2-images Uname: Linux 4.4.0-64-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: WifiSyslog: _MarkForUpload: True dmi.bios.date: 12/12/2016 dmi.bios.vendor: Xen dmi.bios.version: 4.2.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.2.amazon dmi.sys.vendor: Xen To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp