Tom Stocker wrote: > Hello dear Debian folks > > We run a Debian 9.2 build server on top of a vmware ESXi install on a > quite powerful server (Dell Poweredge R730 with 2x Xeon E5-2683v4 (16 > cores per CPU makes 64 vCPUs with HT enabled). Bot installations are fully > updated. Also Dell firmwares are up to date. > > Now I do see stack traces in the Debian /var/log/messages (attached) file, > but no time - corresponding entries in the underlying ESXi logs, so I tend > to say it's a Debian (or a kernel) problem. The traces occur under heavy > load and the server stops to respond. > > Unfortunately we're evaluating vmware for this use-case so I cannot open a > ticket there, as I'm running in eval mode. Its only one vm on this > physical server. And no, it was not my idea to run it on vmware, I was > told to do so. > > I did run the open-vm-tools and tried with the vmware proprietary ones, no > difference. > > Linux hostname 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 > GNU/Linux > > Any ideas what I can do? Any help would be greately appreciated >
IMO what you see is cgroup management getting rid of overloading process, which is normal in some extent. If you mean that the VM is not responding - this needs investigation, or perhaps some tuning regarding cgroup behavior or alike. Nov 27 09:43:03 hostname kernel: [1184306.163531] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Nov 27 09:43:03 hostname kernel: [1184306.163536] Workqueue: cgroup_destroy css_free_work_fn Perhaps you could try with 4.12 or 4.13 or find out what is overloading the system. look for example here https://patchwork.kernel.org/patch/9896303/ regards