Thank-you Kevin, I wondered if this might be Libvirt, but wasn't really sure what level of atomicity things were handled at across the APIs (not coming up with much when looking for a high- but-to-high-to-understand-code-structure -level overview of architecture covering Qemu+Libvirt). Looking into Libvirt's qemu/qemu_migration.c I think I see the problem, will head over to libvir-l...@redhat.com...
On 22 April 2014 19:17, Kevin Wolf <kw...@redhat.com> wrote: > Am 20.04.2014 um 14:33 hat Blair Bethwaite geschrieben: >> Hi, just wondering if devs think this behaviour is bug-worthy? >> >> ---------- Forwarded message ---------- >> From: Blair Bethwaite <blair.bethwa...@gmail.com> >> Date: 16 April 2014 16:29 >> Subject: Local storage-migration plus network disks >> To: qemu-disc...@nongnu.org >> >> >> Hi all, >> >> We have a production OpenStack cloud, currently on Qemu 1.0 & 1.5 using local >> storage with storage-migration when we need to move machines around. We >> noticed >> that with network storage attached (have seen this with iSCSI and Ceph RBD >> targets) that the migration moves all of the network storage contents as >> well, >> which for any non-toy disk sizes pretty much renders it useless as the >> migration is then bounded not by the guest memory size and activity but also >> by >> the block storage size. >> >> I've been tracking (or at least trying to) the changes to storage migration >> over the last few releases in the hope this may be fixed and I just recently >> found this: http://wiki.libvirt.org/page/NBD_storage_migration, which >> suggests >> that "{shared, readonly, source-less}" disks won't be transferred. >> >> But even with Qemu 1.5 we see the behaviour I described above, e.g., we've >> just >> migrated a guest with a 300GB Ceph RBD attached and it has taken over an hour >> to complete (over a 20GE network) and we observe similar amounts of RX and TX >> on both the source and destination, as the source reads blocks from the Ceph >> cluster, streams them to the destination, and the destination in turn writes >> them back to the Ceph cluster. >> >> So why is Qemu performing storage migrate on "network" type devices? > > qemu only does what you tell it. In this case, this is an explicit block > job copying data around for a given device. It looks like your problem > is that this block job is started in the first place. You'll have to > look in a different layer for the code requesting this behaviour from > qemu (that layer could be e.g. libvirt or even OpenStack itself). > > Kevin -- Cheers, ~Blairo