abh1sar commented on issue #12899:
URL: https://github.com/apache/cloudstack/issues/12899#issuecomment-4161915836
@jmsperu Thanks for working on this. This is a very useful improvement to
the NAS Backup provider.
My comments below:
> `nas.backup.full.interval`
1. I didn't understand this. Backups can be scheduled at an hourly schedule.
Can be ad-hoc as well.
> **Full backup (Day 0 or every `nas.backup.full.interval` days):**
>
> 1. Export the entire disk to the NAS mount
> qemu-img convert -f qcow2 -O qcow2 \
> /var/lib/libvirt/images/vm-disk.qcow2 \
> /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2
>
> 2. Create a new dirty bitmap to track changes from this point
> virsh qemu-monitor-command $DOMAIN --hmp \
> 'block-dirty-bitmap-add drive-virtio-disk0 backup-20260327
persistent=true'
>
2. Why can’t we use the libvirt’s backup-begin api for running VMs here as
well?
```
# 1. Use libvirt backup-begin with incremental mode
# This exports only blocks dirty since bitmap "backup-20260327"
cat > /tmp/backup.xml <<'XML'
<domainbackup mode="push">
<disks>
<disk name="vda" backup="yes" type="file">
<target file="/mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2"
type="qcow2"/>
<driver type="qcow2"/>
</disk>
</disks>
<incremental>backup-20260327</incremental>
</domainbackup>
XML
virsh backup-begin $DOMAIN /tmp/backup.xml
# 2. Wait for completion
virsh domjobinfo $DOMAIN --completed
# 3. Rotate bitmaps: remove old, create new
virsh qemu-monitor-command $DOMAIN --hmp \
'block-dirty-bitmap-remove drive-virtio-disk0 backup-20260327'
virsh qemu-monitor-command $DOMAIN --hmp \
'block-dirty-bitmap-add drive-virtio-disk0 backup-20260328 persistent=true'
```
3. Checkpoints based on days won’t work as backups can be hourly and adhoc.
Timestamp based bitmap name can be used instead.
4. I don't think explicit bitmap manipulation is required here. backup-begin
takes care of creating the bitmap as defined in the backup xml.
> Example: Restore to Day 3 (full + 3 incrementals):
```
# 1. Create a working copy from the full backup
cp /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2 /tmp/restored.qcow2
# 2. Apply each incremental in order using qemu-img rebase
# Each incremental is a thin qcow2 containing only changed blocks.
# Rebasing merges the incremental's blocks into the chain.
qemu-img rebase -u -b /tmp/restored.qcow2 \
/mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2
qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2 \
/mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2
qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2 \
/mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2
# 3. Flatten the chain into a single image
qemu-img convert -f qcow2 -O qcow2 \
/mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2 \
/tmp/vm-restored-final.qcow2
# 4. Return the flattened image for CloudStack to import
```
5. qemu-img rebase can be done right after each backup. Then the backup
files themselvess will have information about the chain. And restore task would
be simple.
In addition I have the below concerns:
6. How will incremental backups work for stopped VMs.
7. Can you put some info on how Delete backups will work when the backups is
an incremental backup, and a full backup containing children.
8. VMs in cloudstack are not persistent on a KVM host. When a VM stops and
starts, the domain xml is recreated by Cloudstack. Cloudstack currently doesn't
recreate the domain checkpoints. Checkpoints will need to be recreated when the
VM starts for incremental backup with backup-begin to work.
-------------------------------------------
> **Open Questions for Discussion**
> * **Interest level.** Is there sufficient demand for this feature to
justify the implementation effort? We believe so based on mailing list threads,
but would like confirmation.
Definitely.
> * **Dirty bitmaps vs. alternatives.** Are there concerns about relying on
QEMU dirty bitmaps? Alternative approaches include file-level deduplication on
the NAS (less efficient, not hypervisor-aware) or `qemu-img compare` (slower,
requires reading both images).
I don't think so. It is the way to go.
> * **Target release.** Should this target CloudStack 4.23, or is a later
release more appropriate given the scope?
Given the scope, we should definitely target 4.23. 4.23 is a non-LTS release
meant for adding new functionality and features.
> * **Chain model.** We proposed forward-incremental with periodic full
backups. Would the community prefer a different model (e.g.,
reverse-incremental like Veeam, or forever-incremental with periodic synthetic
fulls)?
I think this is the simplest model. Suitable for us.
> * **Scope of first PR.** Should we submit the entire feature as one PR, or
break it into smaller PRs (e.g., nasbackup.sh changes first, then agent, then
management server, then UI)?
A single PR is better. That way reviewers have full context and can be
tested end-to-end.
> * **Testing infrastructure.** We can test against our production
environment (Ubuntu 22.04, QEMU 6.2, libvirt 8.0). Are there CI environments or
community test labs available for broader testing (RHEL, Rocky, older QEMU
versions)?
Github CI will test build, unit tests and smokes tests. There is an
integration tests for NAS plugin `smokes/test_backup_recovery_nas.py`. Feel
free to add test case for incremental backups. It will be run with the other
smokes tests.
In addition, someone will volunteer to do extensive manual testing of this
feature before merging.
Looking forward to your reply. Let me know if I can help in any other way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]