Greetings! No luck with 5.4.0-80.90, still getting the same bug as before even on kernel version 5.4.0-86. Still no clue on how to reproduce it – hypervisor nodes just randomly crash. I have attached dmesg of the most recent encounter, but it seems identical to previous ones.
Here is fresh crash dump – https://drive.google.com/file/d/1skA238DVtxpY8t8ANdzX1gBC8muChxto/view?usp=sharing ** Attachment added: "crash-260122.log" https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+attachment/5557608/+files/crash-260122.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics Status in linux package in Ubuntu: Incomplete Status in linux-hwe-5.4 package in Ubuntu: Confirmed Bug description: Hi! Recently (throughout the last 6 months) we've upgraded our hypervisor compute hosts from ubuntu bionic kernel 4.15.* to ubuntu bionic hwe kernel 5.4. This month we noticed that several nodes failed due to bugs in cgroups. Trace was different almost every time, but it all revolves around cgroups - either null pointer failures, or panic caught by BUG_ON() macro. Looked like some cgroup didn't exist anymore but somebody tried to access it, thus causing kernel panic. Please find the logs attached. 3 of 4 cases happened after a VM shutdown. We tried to spawn lots of VMs, load them, shut them down, but didn't manage to reproduce the behavior. Actually, every case is sort of different - patch kernel versions (5.4.0-42 to 5.4.0-66), uptime vary (from 1 day to ~half a year). There are also lots of hosts with several months of uptime, no issue with them. Also, on 4.15 we've never seen this behavior, at all. That's quite disturbing, as I don't want dozens of VMs crash (due to host outage) at random times for some vague reason... I didn't manage to find any related bugs on the bug tracker, thus creating this one. I wonder if anybody in the community came across something like that. Could somebody give an advice how to debug further, or where else to report / look for a similar the case? To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp