I have seemingly solved this issue with linux-aws version 4.4.0-1016-aws
at the very least.  The specific issue I was seeing was 2nd order
allocations failing when OOMKiller triggered.  At the time I was
thinking the issue was due to XFS and memory fragmentation with lots and
lots of memory mapped files in Elasticsearch/Lucene.  When we moved to
EXT4 the rate of oomkiller firing dropped, but did not stop.  We made
the following 2 changes to sysctls which have effectively stopped higher
order memory allocaitons from failing and oomkiller firing.

Also these settings were used on i3.2xlarge hosts that have 60G of ram -
your milage may vary.  Also we do not run swap on our servers, so likely
adding swap could have helped, but not an option for us.

vm.min_free_kbytes = 1000000 # We set this to leave about 1G of ram
available for the kernel in the hope that even if the memory was heavily
fragmented there might still be enough memory for linux to grab a higher
order memory allocation fast enough before oomkiller does things.

vm.zone_reclaim_mode = 1 # our hope here was to get the kernel to get
more aggressive in reclaiming memory

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1655842

Title:
  "Out of memory" errors after upgrade to 4.4.0-59

Status in linux package in Ubuntu:
  Fix Released
Status in linux-aws package in Ubuntu:
  New
Status in linux-raspi2 package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Released
Status in linux-aws source package in Xenial:
  New
Status in linux-raspi2 source package in Xenial:
  Confirmed

Bug description:
  I recently replaced some Xenial servers, and started experiencing "Out
  of memory" problems with the default kernel.

  We bake Amazon AMIs based on an official Ubuntu-provided image (ami-
  e6b58e85, in ap-southeast-2, from https://cloud-
  images.ubuntu.com/locator/ec2/).  Previous versions of our AMI
  included "4.4.0-57-generic", but the latest version picked up
  "4.4.0-59-generic" as part of a "dist-upgrade".

  Instances booted using the new AMI have been using more memory, and
  experiencing OOM issues - sometimes during boot, and sometimes a while
  afterwards.  An example from the system log is:

  [  130.113411] cloud-init[1560]: Cloud-init v. 0.7.8 running 'modules:final' 
at Wed, 11 Jan 2017 22:07:53 +0000. Up 29.28 seconds.
  [  130.124219] cloud-init[1560]: Cloud-init v. 0.7.8 finished at Wed, 11 Jan 
2017 22:09:35 +0000. Datasource DataSourceEc2.  Up 130.09 seconds
  [29871.137128] Out of memory: Kill process 2920 (ruby) score 107 or sacrifice 
child
  [29871.140816] Killed process 2920 (ruby) total-vm:675048kB, 
anon-rss:51184kB, file-rss:2164kB
  [29871.449209] Out of memory: Kill process 3257 (splunkd) score 97 or 
sacrifice child
  [29871.453282] Killed process 3258 (splunkd) total-vm:66272kB, 
anon-rss:6676kB, file-rss:0kB
  [29871.677910] Out of memory: Kill process 2647 (fluentd) score 51 or 
sacrifice child
  [29871.681872] Killed process 2647 (fluentd) total-vm:117944kB, 
anon-rss:23956kB, file-rss:1356kB

  I have a hunch that this may be related to the fix for
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1647400,
  introduced in linux (4.4.0-58.79).

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-59-generic 4.4.0-59.80
  ProcVersionSignature: User Name 4.4.0-59.80-generic 4.4.35
  Uname: Linux 4.4.0-59-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jan 12 06:29 seq
   crw-rw---- 1 root audio 116, 33 Jan 12 06:29 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Thu Jan 12 06:38:45 2017
  Ec2AMI: ami-0f93966c
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: ap-southeast-2a
  Ec2InstanceType: t2.nano
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: Xen HVM domU
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 cirrusdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-59-generic 
root=UUID=fb0fef08-f3c5-40bf-9776-f7ba00fe72be ro console=tty1 console=ttyS0
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-59-generic N/A
   linux-backports-modules-4.4.0-59-generic  N/A
   linux-firmware                            1.157.6
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 12/09/2016
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd12/09/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to