Am 14.06.23 um 16:48 schrieb Simon J. Rowe: > On 02/02/2023 12:08, Fiona Ebner wrote: >> Hi, >> over the years we've got 1-2 dozen reports[0] about suddenly >> missing/corrupted MBR/partition tables. The issue seems to be very rare >> and there was no success in trying to reproduce it yet. I'm asking here >> in the hope that somebody has seen something similar. >> >> The only commonality seems to be the use of an ide-hd drive with ahci >> bus. >> >> It does seem to happen with both Linux and Windows guests (one of the >> reports even mentions FreeBSD) and backing storages for the VMs include >> ZFS, RBD, LVM-Thin as well as file-based storages. >> >> Relevant part of an example configuration: >> >>> -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \ >>> -drive >>> 'file=/dev/zvol/myzpool/vm-168-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' >>> \ >>> -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \ >> The first reports are from before io_uring was used and there are also >> reports with writeback cache mode and discard=on,detect-zeroes=unmap. >> >> Some reports say that the issue occurred under high IO load. >> >> Many reports suspect backups causing the issue. Our backup mechanism >> uses backup_job_create() for each drive and runs the jobs sequentially. >> It uses a custom block driver as the backup target which just forwards >> the writes to the actual target which can be a file or our backup server. >> (If you really want to see the details, apply the patches in [1] and see >> pve-backup.c and block/backup-dump.c). >> >> Of course, the backup job will read sector 0 of the source disk, but I >> really can't see where a stray write would happen, why the issue would >> trigger so rarely or why seemingly only ide-hd+ahci would be affected. >> >> So again, just asking if somebody has seen something similar or has a >> hunch of what the cause might be. >> >> [0]: https://bugzilla.proxmox.com/show_bug.cgi?id=2874 >> [1]: >> https://git.proxmox.com/?p=pve-qemu.git;a=tree;f=debian/patches;hb=HEAD >> >> > We've also seen a handful of similar reports. Again, just the MBR sector > overwritten by what looks to be guest data (e.g. log messages). The > common thread with our incidents is again a SATA disk under the AHCI > controller, we have a network backend (iSCSI) which has experienced a > failure. > > I've tried to repro this with blkdebug and simulated write errors, > without success. >
Hi, which version/build of QEMU are you using? Can you correlate the issue with any block job or was the drive in use by the guest only? Best Regards, Fiona
