On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <[email protected]> wrote:
>
> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> > Thanks Eli and Jason for clarifications. See inline.
> >
> > On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <[email protected]> wrote:
> > >
> > > On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> > > >
> > > > On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > > > > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <[email protected]> wrote:
> > > > > >
> > > > > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu
> > > > > > > <[email protected]> wrote:
> > > > > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <[email protected]>
> > > > > > > > wrote:
> > > > > > > > > suspend_vq should only suspend the VQ on not save the current
> > > > > > > > > available
> > > > > > > > > index. This is done when a change of map occurs when the
> > > > > > > > > driver calls
> > > > > > > > > save_channel_info().
> > > > > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter
> > > > > > > > of
> > > > > > > > which doesn't save the available index as save_channel_info()
> > > > > > > > doesn't
> > > > > > > > get called in that path at all. How does it handle the case that
> > > > > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > > > > hardware VQ object was torn down, but userspace still wants to
> > > > > > > > access
> > > > > > > > the queue index?
> > > > > > > >
> > > > > > > > Refer to
> > > > > > > > https://lore.kernel.org/netdev/[email protected]/
> > > > > > > >
> > > > > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily
> > > > > > > > unavailable (11)
> > > > > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily
> > > > > > > > unavailable (11)
> > > > > > > >
> > > > > > > > QEMU will complain with the above warning while VM is being
> > > > > > > > rebooted
> > > > > > > > or shut down.
> > > > > > > >
> > > > > > > > Looks to me either the kernel driver should cover this
> > > > > > > > requirement, or
> > > > > > > > the userspace has to bear the burden in saving the index and
> > > > > > > > not call
> > > > > > > > into kernel if VQ is destroyed.
> > > > > > > Actually, the userspace doesn't have the insights whether virt
> > > > > > > queue
> > > > > > > will be destroyed if just changing the device status via
> > > > > > > set_status().
> > > > > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave
> > > > > > > like
> > > > > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > > > > mlx5_vdpa implementation detail that shouldn't expose to
> > > > > > > userspace.
> > > > > >
> > > > > > So I think we can simply drop this patch?
> > > > > Yep, I think so. To be honest I don't know why it has anything to do
> > > > > with the memory hotplug issue.
> > > >
> > > >
> > > > Eli may know more, my understanding is that, during memory hotplut, qemu
> > > > need to propagate new memory mappings via set_map(). For mellanox, it
> > > > means
> > > > it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> > > >
> > >
> > > I think Siwei was asking why the first patch was related to the hotplug
> > > issue.
> >
> > I was thinking how consistency is assured when saving/restoring this
> > h/w avail_index against the one in the virtq memory, particularly in
> > the region_add/.region_del() context (e.g. the hotplug case). Problem
> > is I don't see explicit memory barrier when guest thread updates the
> > avail_index, how does the device make sure the h/w avail_index is
> > uptodate while guest may race with updating the virtq's avail_index in
> > the mean while? Maybe I completely miss something obvious?
> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1;
> t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08;
> hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References:
> MIME-Version:Content-Type:Content-Disposition:
> Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP:
> X-ClientProxiedBy;
> bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t
> wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t
> 9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3
> TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v
> crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP
> 9xHI3DkNBpEuA
> If you're asking about syncronization upon hot plug of memory, the
> hardware always goes to read the available index from memory when a new
> hardware object is associted with a virtqueue. You can argue then that
> you don't need to restore the available index and you may be right but
> this is the currect inteface to the firmware.
>
>
> If you're asking on generally how sync is assured when the guest updates
> the available index, can you please send a pointer to the code where you
> see the update without a memory barrier?
This is a snippet of virtqueue_add_split() where avail_index gets
updated by guest:
/* Put entry in available array (but don't update avail->idx until they
* do sync). */
avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
/* Descriptors and available array need to be set before we expose the
* new available array entries. */
virtio_wmb(vq->weak_barriers);
vq->split.avail_idx_shadow++;
vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
vq->split.avail_idx_shadow);
vq->num_added++;
There's memory barrier to make sure the update to descriptor and
available ring is seen before writing to the avail->idx, but there
seems no gurantee that this update would flush to the memory
immmedately either before or after the mlx5-vdpa is suspened and gets
the old avail_index value stashed somewhere. In this case, how does
the hardware ensure the consistency between the guest virtio and host
mlx5-vdpa? Or, it completly relies on guest to update the avail_index
once the next buffer is available, so that the index will be in sync
again?
Thanks,
-Siwei
>
> >
> > Thanks,
> > -Siwei
> >
> > >
> > > But you're correct. When memory is added, I get a new memory map. This
> > > requires me to build a new memory key object which covers the new memory
> > > map. Since the virtqueue objects are referencing this memory key, I need
> > > to destroy them and build new ones referncing the new memory key.
> > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > -Siwei
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > > > -Siwei
> > > > > > > >
> > > > > > > >
> > > > > > > > > Signed-off-by: Eli Cohen <[email protected]>
> > > > > > > > > ---
> > > > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > > > > > > > 1 file changed, 8 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > index 88dde3455bfd..549ded074ff3 100644
> > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct
> > > > > > > > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > > > >
> > > > > > > > > static void suspend_vq(struct mlx5_vdpa_net *ndev, struct
> > > > > > > > > mlx5_vdpa_virtqueue *mvq)
> > > > > > > > > {
> > > > > > > > > - struct mlx5_virtq_attr attr;
> > > > > > > > > -
> > > > > > > > > if (!mvq->initialized)
> > > > > > > > > return;
> > > > > > > > >
> > > > > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct
> > > > > > > > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > > > >
> > > > > > > > > if (modify_virtqueue(ndev, mvq,
> > > > > > > > > MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > > > > > > > mlx5_vdpa_warn(&ndev->mvdev, "modify to
> > > > > > > > > suspend failed\n");
> > > > > > > > > -
> > > > > > > > > - if (query_virtqueue(ndev, mvq, &attr)) {
> > > > > > > > > - mlx5_vdpa_warn(&ndev->mvdev, "failed to query
> > > > > > > > > virtqueue\n");
> > > > > > > > > - return;
> > > > > > > > > - }
> > > > > > > > > - mvq->avail_idx = attr.available_index;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > > > > > > --
> > > > > > > > > 2.29.2
> > > > > > > > >
> > > >