control: reassign -1 openssh/1:7.9p1-10+deb10u1 control: retitle -1 openssh-server: cgroup leftovers with socket activation when key exchange fails
On 2019-11-30 13:52, Aurelien Jarno wrote: > Package: src:linux > Version: 4.19.67-2+deb10u2 > Severity: important > > Dear Maintainer, > > Since the latest Debian stable release, I observe a slab memory leak of > about 30MB/hour when running the kernel 4.19.67-2+deb10u2 on an OVH VPS, > which causes an all applications to slowly move to swap after a few days, > and eventually an OOM. You'll find a typical munin memory plot attached > to the bug report. [...] > The problem has been introduced in Debian in kernel 4.19.67-1. I have > found that the problem has been introduced upstream in the 4.19.66 > release. It happens that the original problem is due to SSH, just that this new kernel version makes things way more visible. When using systemd socket activation the OpenSSH daemon sometimes does not remove the cgroup created for the connection after the key exchange algorithm has failed. This usually happens relatively rarely, less than 1% of the time. However on a single CPU system (e.g. VM with a single vCPU), the problem happens 100% of the time. To reproduce the problem using a VM: - Reduce the number of vCPU to 1 - Switch the OpenSSH daemon to systemd socket activation using systemctl enable ssh.socket followed by a reboot - Try to connect to the system with a key exchange algorithm not supported on buster. For example ssh -o KexAlgorithms=diffie-hellman-group-exchange-sha1 host - Look at /sys/fs/cgroup/memory/system.slice/system-ssh.slice. Each connection leaves an entry in that directory. Each entry takes some kernel memory. - Depending on the available memory and available swap, after a few thousands connections the OOM killer kills all the processes making the system unusable. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net