On 12/24/20 8:34 PM, buz.hr...@seznam.cz wrote:
Hi Debian people ;-),
After having some issues with Fedora last year I decided to reinstall all my
servers to Debian 10. I'm supper happy with Debian except one repeating issue I
have with QEMU-KVM hosts that is very difficult to reproduce so I would like to
discuss it first before I open a new bug. Could you please discuss it with me?
;-)
I noticed that when I run VMs for a long period of time (a couple of days) one
or multiple VMs quite often stuck. It is not possible to connect the stuck VMs
using virt-manager and their serial consoles don't respond.
It is not possible to shut them down ("virsh shutdown vm"). Sometimes the stuck VMs can be powered
down ("virsh destroy vm") but in most cases "virsh destroy" doesn't work. In that case
the only thing to do is to shut down rest of running VMs (that do respond) and reboot the host.
From my past experience, we had had very similar issue on old Ubuntu,
which had chance to show hung tasks and extremely low IO performance
after snapshot been made.
Turns out, It was mega-obscure bug around race condition somewhere in
the kernel, because fallocated file for filesystem is not the same as
dully 'dd-ed' (we've used raw images). There is about 70% chance your
case in not that bug, but to confirm/reject it, try to run VMs with raw
images (no qcow2) which was dd if=/dev/zero of=image.img before been
seeded with OS. If they stop showing this type of behavior, you have
direction to investigate. If problem persists, then it's something
different.