Not sure whether the issue is a poor interaction with sd-pam and the
kernel or strictly a kernel issue.

Kernel timeout backtrace:

Sep 21 03:00:33 mainframe01 kernel: [292411.276266]       Not tainted 
4.15.0-1021-aws #21-Ubuntu
Sep 21 03:00:33 mainframe01 kernel: [292411.277931] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 21 03:00:33 mainframe01 kernel: [292411.280331] kworker/u8:5    D    0 
25806      2 0x80000080
Sep 21 03:00:33 mainframe01 kernel: [292411.280339] Workqueue: events_unbound 
fsnotify_mark_destroy_workfn
Sep 21 03:00:33 mainframe01 kernel: [292411.280340] Call Trace:
Sep 21 03:00:33 mainframe01 kernel: [292411.280347]  __schedule+0x291/0x8a0
Sep 21 03:00:33 mainframe01 kernel: [292411.280349]  schedule+0x2c/0x80
Sep 21 03:00:33 mainframe01 kernel: [292411.280350]  
schedule_timeout+0x1cf/0x350
Sep 21 03:00:33 mainframe01 kernel: [292411.280354]  ? add_timer+0x124/0x280
Sep 21 03:00:33 mainframe01 kernel: [292411.280357]  
wait_for_completion+0xba/0x140
Sep 21 03:00:33 mainframe01 kernel: [292411.280362]  ? wake_up_q+0x80/0x80
Sep 21 03:00:33 mainframe01 kernel: [292411.280365]  
__synchronize_srcu.part.13+0x85/0xb0
Sep 21 03:00:33 mainframe01 kernel: [292411.280367]  ? 
trace_raw_output_rcu_utilization+0x50/0x50
Sep 21 03:00:33 mainframe01 kernel: [292411.280369]  synchronize_srcu+0x66/0xe0
Sep 21 03:00:33 mainframe01 kernel: [292411.280370]  ? 
synchronize_srcu+0x66/0xe0
Sep 21 03:00:33 mainframe01 kernel: [292411.280372]  
fsnotify_mark_destroy_workfn+0x7b/0xe0
Sep 21 03:00:33 mainframe01 kernel: [292411.280375]  
process_one_work+0x1de/0x410
Sep 21 03:00:33 mainframe01 kernel: [292411.280377]  worker_thread+0x253/0x410
Sep 21 03:00:33 mainframe01 kernel: [292411.280379]  kthread+0x121/0x140
Sep 21 03:00:33 mainframe01 kernel: [292411.280380]  ? 
process_one_work+0x410/0x410
Sep 21 03:00:33 mainframe01 kernel: [292411.280382]  ? 
kthread_create_worker_on_cpu+0x70/0x70
Sep 21 03:00:33 mainframe01 kernel: [292411.280385]  ? do_syscall_64+0x73/0x130
Sep 21 03:00:33 mainframe01 kernel: [292411.280387]  ? SyS_exit+0x17/0x20
Sep 21 03:00:33 mainframe01 kernel: [292411.280391]  ret_from_fork+0x35/0x40

** Information type changed from Private Security to Public Security

** Also affects: linux (Ubuntu)
   Importance: Undecided
       Status: New

** Tags added: bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794169

Title:
  AWS ubuntu became unreachable after ssh login

Status in linux package in Ubuntu:
  Incomplete
Status in systemd package in Ubuntu:
  New

Bug description:
  I've reached strange situation with Ubuntu 18.04 LTS with latest
  kernel on AWS m5.xlarge instance.

  System became unreachable after series of successful ssh logins.
  systemd -user became zombie and block main systemd daemon (PID 1).

  I've created bug https://github.com/systemd/systemd/issues/10123 but
  it was closed with "there's a problem with your kernel".
  https://github.com/systemd/systemd/issues/10123#issuecomment-423984751

  Symptoms are very similar to
  https://github.com/systemd/systemd/issues/8598

  apetren+ 26679  0.0  0.0      0     0 ?        Z    02:56   0:00  \_ 
[(sd-pam)] <defunct>
  apetren+ 26855  0.0  0.0  76636  7816 ?        Ds   02:57   0:00 
/lib/systemd/systemd --user
  apetren+ 26856  0.0  0.0      0     0 ?        Z    02:57   0:00  \_ 
[(sd-pam)] <defunct>
  apetren+ 26954  0.0  0.0      0     0 ?        Zs   02:57   0:00  \_ [kill] 
<defunct>
  apetren+ 27053  0.0  0.0  76636  7496 ?        Ss   02:58   0:00 
/lib/systemd/systemd --user
  apetren+ 27054  0.0  0.0 193972  2768 ?        S    02:58   0:00  \_ (sd-pam)

  This situation is repeatable on 7 instances 1-2 times per week.

  how to repeat: 1. Install ubuntu 18.04 LTS from official ubuntu image
  2. update kernel and packages to latest version 3. from another
  instance run

  while `true` ;do ssh ubu...@your.instance.ip "hostname; ps -ef|grep
  defunc |grep -v grep" ; done

  By this command in couple of days I have 2->4->6->8... zombies and in
  a hour system is frozen...

  sudo reboot is not working, because systemd with PID 1 is unreachable.
  kill -9 1 -- not working as well.

  # uname -r:
  Linux mainframe04 4.15.0-1021-aws #21-Ubuntu SMP Tue Aug 28 10:23:07 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  # cat /etc/lsb-release
  DISTRIB_ID=Ubuntu
  DISTRIB_RELEASE=18.04
  DISTRIB_CODENAME=bionic
  DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"

  # systemd --version
  systemd 237
  +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
+GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 
default-hierarchy=hybrid

  
  AWS instance m5.xlarge

  Please let me know if you need any information.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794169/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to