Note on above; once hotadd is disabled, the xen balloon driver will
still perform the memory hotplug, but the added pages won't be available
for use.  So you can check /proc/zoneinfo, and look at the Normal zone,
e.g.:

with hotadd enabled (the default in Ubuntu):

Node 0, zone Normal
  pages free 15116671
        min 7661
        low 22873
        high 38085
   node_scanned 0
        spanned 15499264
        present 15499264
        managed 15212161


notice the 'spanned' and 'present' pages are the same; the 'spanned' pages 
include the physical pages added by the xen balloon driver, and 'present' 
indicates they're available for use (some of them are, based on how inflated 
the balloon is).

With memory hotadd disabled (commented out in the udev rules file, as
shown in above comment):


Node 0, zone   Normal
  pages free     15104522
        min      16356
        low      31567
        high     46778
   node_scanned  0
        spanned  15499264
        present  15466496
        managed  15212150

notice the 'spanned' pages is the same as before, meaning the xen
balloon driver still added the physical pages, but the 'present' value
is lower, indicating the extra balloon pages aren't available for the
system to use, meaning they won't be sent to the NVMe controller, which
works around this bug.


** Changed in: linux-aws (Ubuntu)
       Status: Triaged => In Progress

** Changed in: linux-aws (Ubuntu Xenial)
       Status: Fix Committed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1668129

Title:
  Amazon I3 Instance Buffer I/O error on dev nvme0n1

Status in linux package in Ubuntu:
  Triaged
Status in linux-aws package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Triaged
Status in linux-aws source package in Xenial:
  In Progress

Bug description:
  On the AWS i3 instance class - when putting the new NVME storage disks
  under high IO load - seeing data corruption and errors in dmesg

  
  [  662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912
  [  662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost 
async page write
  [  662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost 
async page write
  [  662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost 
async page write
  [  662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost 
async page write
  [  662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost 
async page write
  [  662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost 
async page write
  [  662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost 
async page write
  [  662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost 
async page write
  [  662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost 
async page write
  [  662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost 
async page write
  [  663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912
  <snip>
  [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744
  [ 1012.755396] buffer_io_error: 194552 callbacks suppressed
  [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async 
page write
  [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async 
page write
  [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async 
page write
  [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async 
page write
  [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async 
page write

  Able to replicate this with a bonnie++ stress test.

  bonnie++ -d /mnt/test/ -r 1000

  Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Feb 27 02:12 seq
   crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  DistroRelease: Ubuntu 16.04
  Ec2AMI: ami-bc62b2aa
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: i3.2xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  JournalErrors:
   Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] 
failed with exit code 1: Hint: You are currently not seeing messages from other 
users and the system.
         Users in the 'systemd-journal' group can see all messages. Pass -q to
         turn off this notice.
   No journal files were opened due to insufficient permissions.
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: Xen HVM domU
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic 
root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0
  ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-64-generic N/A
   linux-backports-modules-4.4.0-64-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial ec2-images
  Uname: Linux 4.4.0-64-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  WifiSyslog:
   
  _MarkForUpload: True
  dmi.bios.date: 12/12/2016
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to