Not sure whether the issue is a poor interaction with sd-pam and the kernel or strictly a kernel issue.
Kernel timeout backtrace: Sep 21 03:00:33 mainframe01 kernel: [292411.276266] Not tainted 4.15.0-1021-aws #21-Ubuntu Sep 21 03:00:33 mainframe01 kernel: [292411.277931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 21 03:00:33 mainframe01 kernel: [292411.280331] kworker/u8:5 D 0 25806 2 0x80000080 Sep 21 03:00:33 mainframe01 kernel: [292411.280339] Workqueue: events_unbound fsnotify_mark_destroy_workfn Sep 21 03:00:33 mainframe01 kernel: [292411.280340] Call Trace: Sep 21 03:00:33 mainframe01 kernel: [292411.280347] __schedule+0x291/0x8a0 Sep 21 03:00:33 mainframe01 kernel: [292411.280349] schedule+0x2c/0x80 Sep 21 03:00:33 mainframe01 kernel: [292411.280350] schedule_timeout+0x1cf/0x350 Sep 21 03:00:33 mainframe01 kernel: [292411.280354] ? add_timer+0x124/0x280 Sep 21 03:00:33 mainframe01 kernel: [292411.280357] wait_for_completion+0xba/0x140 Sep 21 03:00:33 mainframe01 kernel: [292411.280362] ? wake_up_q+0x80/0x80 Sep 21 03:00:33 mainframe01 kernel: [292411.280365] __synchronize_srcu.part.13+0x85/0xb0 Sep 21 03:00:33 mainframe01 kernel: [292411.280367] ? trace_raw_output_rcu_utilization+0x50/0x50 Sep 21 03:00:33 mainframe01 kernel: [292411.280369] synchronize_srcu+0x66/0xe0 Sep 21 03:00:33 mainframe01 kernel: [292411.280370] ? synchronize_srcu+0x66/0xe0 Sep 21 03:00:33 mainframe01 kernel: [292411.280372] fsnotify_mark_destroy_workfn+0x7b/0xe0 Sep 21 03:00:33 mainframe01 kernel: [292411.280375] process_one_work+0x1de/0x410 Sep 21 03:00:33 mainframe01 kernel: [292411.280377] worker_thread+0x253/0x410 Sep 21 03:00:33 mainframe01 kernel: [292411.280379] kthread+0x121/0x140 Sep 21 03:00:33 mainframe01 kernel: [292411.280380] ? process_one_work+0x410/0x410 Sep 21 03:00:33 mainframe01 kernel: [292411.280382] ? kthread_create_worker_on_cpu+0x70/0x70 Sep 21 03:00:33 mainframe01 kernel: [292411.280385] ? do_syscall_64+0x73/0x130 Sep 21 03:00:33 mainframe01 kernel: [292411.280387] ? SyS_exit+0x17/0x20 Sep 21 03:00:33 mainframe01 kernel: [292411.280391] ret_from_fork+0x35/0x40 ** Information type changed from Private Security to Public Security ** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags added: bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1794169 Title: AWS ubuntu became unreachable after ssh login Status in linux package in Ubuntu: Incomplete Status in systemd package in Ubuntu: New Bug description: I've reached strange situation with Ubuntu 18.04 LTS with latest kernel on AWS m5.xlarge instance. System became unreachable after series of successful ssh logins. systemd -user became zombie and block main systemd daemon (PID 1). I've created bug https://github.com/systemd/systemd/issues/10123 but it was closed with "there's a problem with your kernel". https://github.com/systemd/systemd/issues/10123#issuecomment-423984751 Symptoms are very similar to https://github.com/systemd/systemd/issues/8598 apetren+ 26679 0.0 0.0 0 0 ? Z 02:56 0:00 \_ [(sd-pam)] <defunct> apetren+ 26855 0.0 0.0 76636 7816 ? Ds 02:57 0:00 /lib/systemd/systemd --user apetren+ 26856 0.0 0.0 0 0 ? Z 02:57 0:00 \_ [(sd-pam)] <defunct> apetren+ 26954 0.0 0.0 0 0 ? Zs 02:57 0:00 \_ [kill] <defunct> apetren+ 27053 0.0 0.0 76636 7496 ? Ss 02:58 0:00 /lib/systemd/systemd --user apetren+ 27054 0.0 0.0 193972 2768 ? S 02:58 0:00 \_ (sd-pam) This situation is repeatable on 7 instances 1-2 times per week. how to repeat: 1. Install ubuntu 18.04 LTS from official ubuntu image 2. update kernel and packages to latest version 3. from another instance run while `true` ;do ssh ubu...@your.instance.ip "hostname; ps -ef|grep defunc |grep -v grep" ; done By this command in couple of days I have 2->4->6->8... zombies and in a hour system is frozen... sudo reboot is not working, because systemd with PID 1 is unreachable. kill -9 1 -- not working as well. # uname -r: Linux mainframe04 4.15.0-1021-aws #21-Ubuntu SMP Tue Aug 28 10:23:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS" # systemd --version systemd 237 +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid AWS instance m5.xlarge Please let me know if you need any information. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794169/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp