On 01/04/14 17:48, Markus Armbruster wrote:
Heinz Graalfs <[email protected]> writes:Hi Kevin, doing a virsh detach-device ... ends up in the following QEMU monitor commands: 1. device_del ... 2. drive_del ... qmp_device_del() performs the device unplug path. In case of a block device do_drive_del() tries to prevent further IO against the host device. However, bdrv_find() during drive_del() results in an error, because the device is already gone. Due to this error all the bdrv_xxx calls to quiesce the block driver as well as all other processing is skipped. Is the sequence that libvirt triggers OK? Shouldn't drive_del be executed first?No.
OK, I see. The drive is deleted implicitly (release_drive()). Doing a device_del() requires another drive_add() AND device_add(). (Doing just a device_add() complains about the missing drive. A subsequent info qtree lets QEMU abort.)
drive_del is nasty. Its purpose is to revoke access to an image even when the guest refuses to cooperate. To the guest, this looks like hardware failure.
Deleting a device during active IO is nasty and it should look like a hardware failure. I would expect lots of errors.
If you drive_del before device_del, even a perfectly well-behaved guest guest is exposed to a terminally broken device between drive_del and completion of unplug.
The early drive_del() would mean that no further IO would be possible.
Always try a device_del first, and only if that does not complete within reasonable time, and you absolutely must revoke access to the image, then whack it over the head with drive_del.
What is this reasonable time? On 390 we experience problems (QEMU abort) when asynch block IO completes and the virtqueues are already gone. I suppose the bdrv_drain_all() in bdrv_close() is a little late. I don't see such problems with an early bdrv_drain_all() (drive_del) and an unplug (device_del) afterwards.
