Am 24.01.24 um 16:03 schrieb Fiona Ebner: > Am 17.01.24 um 17:07 schrieb Vladimir Sementsov-Ogievskiy: >> Add a parameter that enables discard-after-copy. That is mostly useful >> in "push backup with fleecing" scheme, when source is snapshot-access >> format driver node, based on copy-before-write filter snapshot-access >> API: >> >> [guest] [snapshot-access] ~~ blockdev-backup ~~> [backup target] >> | | >> | root | file >> v v >> [copy-before-write] >> | | >> | file | target >> v v >> [active disk] [temp.img] >> >> In this case discard-after-copy does two things: >> >> - discard data in temp.img to save disk space >> - avoid further copy-before-write operation in discarded area >> >> Note that we have to declare WRITE permission on source in >> copy-before-write filter, for discard to work. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@yandex-team.ru> > > Ran into another issue when the cluster_size of the fleecing image is > larger than for the backup target, e.g. > >> #!/bin/bash >> rm /tmp/fleecing.qcow2 >> ./qemu-img create /tmp/disk.qcow2 -f qcow2 1G >> ./qemu-img create /tmp/fleecing.qcow2 -o cluster_size=2M -f qcow2 1G >> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G >> ./qemu-system-x86_64 --qmp stdio \ >> --blockdev >> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \ >> --blockdev >> qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2,discard=unmap >> \ >> --blockdev >> qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \ >> <<EOF >> {"execute": "qmp_capabilities"} >> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", >> "file": "node0", "target": "node1", "node-name": "node3" } } >> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", >> "file": "node3", "discard": "unmap", "node-name": "snap0" } } >> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": >> "node2", "sync": "full", "job-id": "backup0", "discard-source": true } } >> EOF > > will fail with > >> qemu-system-x86_64: ../util/hbitmap.c:570: hbitmap_reset: Assertion >> `QEMU_IS_ALIGNED(count, gran) || (start + count == hb->orig_size)' failed. > > Backtrace shows the assert happens while discarding, when resetting the > BDRVCopyBeforeWriteState access_bitmap > > #6 0x0000555556142a2a in hbitmap_reset (hb=0x555557e01b80, start=0, > count=1048576) at ../util/hbitmap.c:570 >> #7 0x0000555555f80764 in bdrv_reset_dirty_bitmap_locked >> (bitmap=0x55555850a660, offset=0, bytes=1048576) at >> ../block/dirty-bitmap.c:563 >> #8 0x0000555555f807ab in bdrv_reset_dirty_bitmap (bitmap=0x55555850a660, >> offset=0, bytes=1048576) at ../block/dirty-bitmap.c:570 >> #9 0x0000555555f7bb16 in cbw_co_pdiscard_snapshot (bs=0x5555581a7f60, >> offset=0, bytes=1048576) at ../block/copy-before-write.c:330 >> #10 0x0000555555f8d00a in bdrv_co_pdiscard_snapshot (bs=0x5555581a7f60, >> offset=0, bytes=1048576) at ../block/io.c:3734 >> #11 0x0000555555fd2380 in snapshot_access_co_pdiscard (bs=0x5555582b4f60, >> offset=0, bytes=1048576) at ../block/snapshot-access.c:55 >> #12 0x0000555555f8b65d in bdrv_co_pdiscard (child=0x5555584fe790, offset=0, >> bytes=1048576) at ../block/io.c:3144 >> #13 0x0000555555f78650 in block_copy_task_entry (task=0x555557f588f0) at >> ../block/block-copy.c:597 > > My guess for the cause is that in block_copy_calculate_cluster_size() we > only look at the target. But now that we need to discard the source, > we'll also need to consider that for the calculation? >
Just querying the source and picking the maximum won't work either, because snapshot-access does not currently implement .bdrv_co_get_info and because copy-before-write (doesn't implement .bdrv_co_get_info and is a filter) will just return the info of its file child. But the discard will go to the target child. If I do 1. .bdrv_co_get_info in snapshot-access: return info from file child 2. .bdrv_co_get_info in copy-before-write: return maximum cluster_size from file child and target child 3. block_copy_calculate_cluster_size: return maximum from source and target then the issue does go away, but I don't know if that's not violating any assumptions and probably there's a better way to avoid the issue? Best Regards, Fiona