On an i3 instance in east-1, where i can reproduce fairly easily, the
errors i'm getting unfortunately don't help. the nvme controller is
failing some requests, but it isn't providing any useful info about why
it doesn't like the requests. for example, here is some debug I added:

[ 1464.634709] nvme nvme0: invalid field command_id 3eb qid 5 cmd_type 1
cmd_flags 4001 data_dir 1 status 2002

the controller is failing a request with error 2 "invalid field", and
sets the "more error data" flag 0x2000. So I pulled the error log page,
which is supposed to provide more data about why the request failed.

[ 1464.634836] nvme nvme0: error log entry: count 5d5281a qid 5
command_id 3eb status 2002 byte ff bit ff lba 0 ns 1 vendor 0 csi 0

the nvme controller error log is supposed to provide details about the
failure, but this provides no new info; the qid and command_id match the
failure above, but the error location fields (byte and bit) which are
supposed to point to the specific byte/bit in the request that the
controller doesn't like, are set to 0xffff which means "If the error is
not specific to a particular command then this field shall be set to
FFFFh" - so that's totally unhelpful.

Instead of trying to determine what the controller's unhappy about, I'll
try bisecting with an older kernel, to find the commit that introduces
the failure.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1668129

Title:
  Amazon I3 Instance Buffer I/O error on dev nvme0n1

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  On the AWS i3 instance class - when putting the new NVME storage disks
  under high IO load - seeing data corruption and errors in dmesg

  
  [  662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912
  [  662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost 
async page write
  [  662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost 
async page write
  [  662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost 
async page write
  [  662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost 
async page write
  [  662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost 
async page write
  [  662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost 
async page write
  [  662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost 
async page write
  [  662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost 
async page write
  [  662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost 
async page write
  [  662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost 
async page write
  [  663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912
  <snip>
  [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744
  [ 1012.755396] buffer_io_error: 194552 callbacks suppressed
  [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async 
page write
  [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async 
page write
  [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async 
page write
  [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async 
page write
  [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async 
page write

  Able to replicate this with a bonnie++ stress test.

  bonnie++ -d /mnt/test/ -r 1000

  Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Feb 27 02:12 seq
   crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  DistroRelease: Ubuntu 16.04
  Ec2AMI: ami-bc62b2aa
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: i3.2xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  JournalErrors:
   Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] 
failed with exit code 1: Hint: You are currently not seeing messages from other 
users and the system.
         Users in the 'systemd-journal' group can see all messages. Pass -q to
         turn off this notice.
   No journal files were opened due to insufficient permissions.
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: Xen HVM domU
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic 
root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0
  ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-64-generic N/A
   linux-backports-modules-4.4.0-64-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial ec2-images
  Uname: Linux 4.4.0-64-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  WifiSyslog:
   
  _MarkForUpload: True
  dmi.bios.date: 12/12/2016
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to