On 09.10.25 22:09, Raphael Norwitz wrote:
A few suggestions here. Overall, it looks sane to me.
First my applogizes, I should have said it earlier: I'm preparing a v2, and starting from this patch it's significantly reworked (the previous big part of refactoring (01-23,25) is still relevant) I have a parallel series for similar migration of virtio-net/tap (TAP device fds are migrated through UNIX socket), and there were a lot of discussions, and the ideas applies to vhost-user-blk series as well. The main change of v2 is significantly simplified interface: the whole feature is enabled/disable by one migration parameter, no need for per-device options. But this requires additional changes in code, as we have to postpone backend (chardev opening and initial communication to vhost-server) until the point in time when we know, are we going to get the fds from migration channel or not. Next, migration part was revorked into VMSD structures instead of .save() / .load() handlers. Now, my work is to look at the comments and understand, how much they apply to upcoming v2.
On Wed, Aug 13, 2025 at 12:56 PM Vladimir Sementsov-Ogievskiy <[email protected]> wrote:
[..]
@@ -1624,21 +1652,30 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, r = vhost_set_backend_type(hdev, backend_type); assert(r >= 0); - r = hdev->vhost_ops->vhost_backend_init(hdev, opaque, errp); - if (r < 0) { - goto fail; + if (hdev->migrating_backend) { + /* backend must support detached state */Probably better to error_report() or something other than a raw assert?
Assert is better, as this is not possible. Still, no such handlers in v2.
+ assert(hdev->vhost_ops->vhost_save_backend); + assert(hdev->vhost_ops->vhost_load_backend); + hdev->_features_wait_incoming = true; } - r = hdev->vhost_ops->vhost_set_owner(hdev); + r = hdev->vhost_ops->vhost_backend_init(hdev, opaque, errp); if (r < 0) { - error_setg_errno(errp, -r, "vhost_set_owner failed");
[..]
@@ -1920,6 +1960,12 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits, uint64_t features) { const int *bit = feature_bits; +Should this be if (hdev->_features_wait_incoming && hdev->migrating_backend) { to not impact existing flows?
This code is still in v2. But _features_wait_incoming is a new field introduced withi this commit, so there are no existing flows with it.. And in v2 _features_wait_incoming and migrating_backend are less connected. Initialization code in v2 doesn't rely on .migrating_backend (as we just don't know :). stop()/start() code will still rely on .migrating_backend.
+ if (hdev->_features_wait_incoming) { + /* Excessive set is enough for early initialization. */ + return features; + } + while (*bit != VHOST_INVALID_FEATURE_BIT) { uint64_t bit_mask = (1ULL << *bit); if (!vhost_dev_has_feature(hdev, *bit)) { @@ -1930,6 +1976,66 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits, return features; }
[..]
@@ -2204,14 +2317,29 @@ static int do_vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, event_notifier_cleanup( &hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier); + if (hdev->migrating_backend) {Ditto - no raw assert()?
no handlers - no problmes (in v2 :). Still, I'm sure that assert was good here, as we never set migrating_backend for devices which don't support it.
+ /* backend must support detached state */ + assert(hdev->vhost_ops->vhost_save_backend); + assert(hdev->vhost_ops->vhost_load_backend); + } + trace_vhost_dev_stop(hdev, vdev->name, vrings); if (hdev->vhost_ops->vhost_dev_start) { hdev->vhost_ops->vhost_dev_start(hdev, false); } - if (vrings) { + if (vrings && !hdev->migrating_backend) { vhost_dev_set_vring_enable(hdev, false); } + + if (hdev->migrating_backend) { + for (i = 0; i < hdev->nvqs; ++i) { + struct vhost_virtqueue *vq = hdev->vqs + i; + + event_notifier_set_handler(&vq->error_notifier, NULL); + } + } + for (i = 0; i < hdev->nvqs; ++i) { rc |= do_vhost_virtqueue_stop(hdev, vdev,
-- Best regards, Vladimir
