On Wednesday, 14 June 2023 Simon Rowe wrote:
> We've also seen a handful of similar reports. Again, just the MBR sector
> overwritten by what looks to be guest data (e.g. log messages). The
> common thread with our incidents is again a SATA disk under the AHCI
> controller, we have a network backend (iSCSI) which has experienced a
> failure.
>
> I've tried to repro this with blkdebug and simulated write errors,
> without success.
I’ve finally has some success in reproducing this issue. I have a test
environment set up as follows:
* QEMU 4.2
* guest booting from CD with a small SATA disk
* guest test harness partitions the disk then continually writes data to the
partition while checking the integrity of the MBR
* filter script that interposes between QEMU and the iSCSI backend, this drops
writes and then resets the connection after a period of time
>From tracing in the filter script I can see unsolicited writes to LBA 0 once
>the SATA controller is reset
Data in: iSCSI op 01 SCSI op 28 LBA 0 NOP count 5 wait for read False
Data in: iSCSI op 01 SCSI op 28 LBA 0 NOP count 6 wait for read False
Data in: iSCSI op 01 SCSI op 2a LBA 0 NOP count 0 wait for read True
Data in: iSCSI op 01 SCSI op 28 LBA 0 NOP count 0 wait for read False
I have a stack trace at the time that the write occurs
#0 iscsi_co_writev (bs=0x564322ecc220, sector_num=<optimized out>,
nb_sectors=1, iov=0x7fc20c045860, flags=<optimized out>)
at block/iscsi.c:641
#1 0x00005643220e780b in bdrv_driver_pwritev (bs=bs@entry=0x564322ecc220,
offset=offset@entry=0, bytes=bytes@entry=512,
qiov=qiov@entry=0x7fc20c045860, qiov_offset=qiov_offset@entry=0,
flags=flags@entry=0) at block/io.c:1216
#2 0x00005643220e9985 in bdrv_aligned_pwritev (
child=child@entry=0x564322ecb050, req=req@entry=0x7fc2aa90bb00, offset=0,
bytes=512, align=align@entry=512, qiov=0x7fc20c045860, qiov_offset=0,
flags=flags@entry=0) at block/io.c:1980
#3 0x00005643220ea25b in bdrv_co_pwritev_part (child=0x564322ecb050,
offset=offset@entry=0, bytes=bytes@entry=512,
qiov=qiov@entry=0x7fc20c045860, qiov_offset=qiov_offset@entry=0, flags=0)
at block/io.c:2137
#4 0x00005643220ea55b in bdrv_co_pwritev (child=<optimized out>,
offset=offset@entry=0, bytes=bytes@entry=512,
qiov=qiov@entry=0x7fc20c045860, flags=<optimized out>) at block/io.c:2087
#5 0x00005643220aa64d in raw_co_pwritev (bs=0x564322ec4a00, offset=0,
bytes=512, qiov=0x7fc20c045860, flags=<optimized out>)
at block/raw-format.c:258
#6 0x00005643220e7702 in bdrv_driver_pwritev (bs=bs@entry=0x564322ec4a00,
offset=offset@entry=0, bytes=bytes@entry=512,
qiov=qiov@entry=0x7fc20c045860, qiov_offset=qiov_offset@entry=0,
flags=flags@entry=0) at block/io.c:1183
#7 0x00005643220e9985 in bdrv_aligned_pwritev (
child=child@entry=0x564322ed28c0, req=req@entry=0x7fc2aa90be70, offset=0,
bytes=512, align=align@entry=1, qiov=0x7fc20c045860, qiov_offset=0,
flags=flags@entry=0) at block/io.c:1980
#8 0x00005643220ea25b in bdrv_co_pwritev_part (child=0x564322ed28c0,
offset=offset@entry=0, bytes=bytes@entry=512,
qiov=qiov@entry=0x7fc20c045860, qiov_offset=qiov_offset@entry=0, flags=0)
at block/io.c:2137
#9 0x00005643220d63b4 in blk_do_pwritev_part (blk=0x564322ec4570, offset=0,
bytes=512, qiov=0x7fc20c045860, qiov_offset=qiov_offset@entry=0,
flags=<optimized out>) at block/block-backend.c:1231
#10 0x00005643220d650d in blk_aio_write_entry (opaque=0x7fc20c045520)
at block/block-backend.c:1439
#11 0x000056432218706a in coroutine_trampoline (i0=<optimized out>,
i1=<optimized out>) at util/coroutine-ucontext.c:115
#12 0x00007fc2afa20190 in ?? () from /lib64/libc.so.6
#13 0x00007fc2b3e01aa0 in ?? ()
#14 0x0000000000000000 in ?? ()
I’m not familiar with the storage code of QEMU, any suggestions about how to
proceed debugging this?
Regards
Simon