Maybe but we would more information to say for sure.

There have been no changes in apparmor between the reported working
20180109 and 20180126.

The warning
> "Warning failed to create cache: usr.sbin.sssd" before the instance

just means that apparmor was not able to cache the binary policy that it
loaded. This is not unusual if policy configuration hasn't been updated
some image configurations. Eg. if /etc/ is ro and the apparmor cache is
at its default location of /etc/apparmor.d/cache. This warning would
come during packaging install or boot, before sshd is run.

We can easily test whether apparmor policy load is causing the issue by
manually calling the apparmor_parser on policy separate from invoking
the application/services associated with the fault.

  sudo apparmor_parser -rK /etc/apparmor.d/

we can also decouple apparmor policy enforcement from the application/serives 
by disabling the profile on the instance
  sudo aa-disable /etc/apparmor.d/usr.sbin.sssd

or all profiles
  sudo systemctl disable apparmor.service

and we can disable apparmor from being used on the kernel at boot by adding the 
kernel parameter
  apparmor=0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1746806

Title:
  sssd appears to crash AWS c5 and m5 instances, cause 100% CPU

Status in cloud-images:
  New
Status in linux package in Ubuntu:
  Confirmed
Status in sssd package in Ubuntu:
  Confirmed

Bug description:
  After upgrading to the Ubuntu EC2 AMI from 20180126 (specifically
  ami-79873901 in us-west-2) we have seen sssd hard locking c5 and m5
  EC2 instances after starting the service and CPU goes to 100%.

  We do not experience this issue with t2 or c4 instance types and we do
  not see this issue on any instance types using Ubuntu Cloud images
  from 20180109 or before. I have verified that this is kernel related
  as I booted an image that we created using the Ubuntu cloud image from
  20180109 which works fine on a c5. I then did a "apt update && apt
  install --only-upgrade linux-aws && systemctl disable sssd", rebooted
  the server, verified I was on the new kernel and started sssd with
  "systemctl start sssd" and the EC2 instance froze and Cloudwatch CPU
  usage for that instance went to 100%.

  I haven't been able to find much in the syslog, kern.log, journalctl
  logs, etc. The only thing I have been able to find is that when this
  happens I tend to see "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@" in
  the syslog and sssd log files.  I have attached several log files and
  the output of a "apport-bug /usr/sbin/sssd". Let me know if you need
  anything else to help track this down.

  Thanks,
  Paul

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/1746806/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to