Not quite sure about the whole picture.
Seems while qemu waiting response of vhost_user_get_status, dpdk send out
VHOST_USER_SLAVE_IOTLB_MSG and trigger qemu function
vhost_backend_update_device_iotlb.
Qemu wait on reply of VHOST_USER_IOTLB_MSG but get VHOST_USER_GET_STATUS reply.
Break on first error message("Received unexpected msg type. Expected 22
received 40")
#0 0x0000555555b72ed4 in process_message_reply (dev=0x5555584dd600,
msg=0x7fffffffa330) at ../hw/virtio/vhost-user.c:445
#1 0x0000555555b77c26 in vhost_user_send_device_iotlb_msg (dev=0x5555584dd600,
imsg=0x7fffffffa600) at ../hw/virtio/vhost-user.c:2341
#2 0x0000555555b7179e in vhost_backend_update_device_iotlb
(dev=0x5555584dd600, iova=10442706944, uaddr=140736119902208, len=4096,
perm=IOMMU_RW) at ../hw/virtio/vhost-backend.c:361
#3 0x0000555555b6e34c in vhost_device_iotlb_miss (dev=0x5555584dd600,
iova=10442706944, write=1) at ../hw/virtio/vhost.c:1113
#4 0x0000555555b718d9 in vhost_backend_handle_iotlb_msg (dev=0x5555584dd600,
imsg=0x7fffffffa7b0) at ../hw/virtio/vhost-backend.c:393
#5 0x0000555555b76144 in slave_read (ioc=0x555557a38680, condition=G_IO_IN,
opaque=0x5555584dd600) at ../hw/virtio/vhost-user.c:1726
#6 0x0000555555c797a5 in qio_channel_fd_source_dispatch
(source=0x555556a06fb0, callback=0x555555b75f86 <slave_read>,
user_data=0x5555584dd600) at ../io/channel-watch.c:84
#7 0x00007ffff554895d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#8 0x00007ffff5548d18 in g_main_context_iterate.isra () at
/lib64/libglib-2.0.so.0
#9 0x00007ffff5549042 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#10 0x0000555555b72de7 in vhost_user_read (dev=0x5555584dd600,
msg=0x7fffffffac50) at ../hw/virtio/vhost-user.c:413
#11 0x0000555555b72e9b in process_message_reply (dev=0x5555584dd600,
msg=0x7fffffffaf10) at ../hw/virtio/vhost-user.c:439
#12 0x0000555555b77c26 in vhost_user_send_device_iotlb_msg (dev=0x5555584dd600,
imsg=0x7fffffffb1e0) at ../hw/virtio/vhost-user.c:2341
#13 0x0000555555b7179e in vhost_backend_update_device_iotlb
(dev=0x5555584dd600, iova=10468392960, uaddr=140736145588224, len=4096,
perm=IOMMU_RW) at ../hw/virtio/vhost-backend.c:361
#14 0x0000555555b6e34c in vhost_device_iotlb_miss (dev=0x5555584dd600,
iova=10468392960, write=1) at ../hw/virtio/vhost.c:1113
#15 0x0000555555b718d9 in vhost_backend_handle_iotlb_msg (dev=0x5555584dd600,
imsg=0x7fffffffb390) at ../hw/virtio/vhost-backend.c:393
#16 0x0000555555b76144 in slave_read (ioc=0x555557a38680, condition=G_IO_IN,
opaque=0x5555584dd600) at ../hw/virtio/vhost-user.c:1726
#17 0x0000555555c797a5 in qio_channel_fd_source_dispatch
(source=0x555556c70250, callback=0x555555b75f86 <slave_read>,
user_data=0x5555584dd600) at ../io/channel-watch.c:84
#18 0x00007ffff554895d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#19 0x00007ffff5548d18 in g_main_context_iterate.isra () at
/lib64/libglib-2.0.so.0
#20 0x00007ffff5549042 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#21 0x0000555555b72de7 in vhost_user_read (dev=0x5555584dd600,
msg=0x7fffffffb830) at ../hw/virtio/vhost-user.c:413
#22 0x0000555555b72e9b in process_message_reply (dev=0x5555584dd600,
msg=0x7fffffffbaf0) at ../hw/virtio/vhost-user.c:439
#23 0x0000555555b77c26 in vhost_user_send_device_iotlb_msg (dev=0x5555584dd600,
imsg=0x7fffffffbdc0) at ../hw/virtio/vhost-user.c:2341
#24 0x0000555555b7179e in vhost_backend_update_device_iotlb
(dev=0x5555584dd600, iova=10442702848, uaddr=140736119898112, len=4096,
perm=IOMMU_RW) at ../hw/virtio/vhost-backend.c:361
#25 0x0000555555b6e34c in vhost_device_iotlb_miss (dev=0x5555584dd600,
iova=10442702848, write=1) at ../hw/virtio/vhost.c:1113
#26 0x0000555555b718d9 in vhost_backend_handle_iotlb_msg (dev=0x5555584dd600,
imsg=0x7fffffffbf70) at ../hw/virtio/vhost-backend.c:393
#27 0x0000555555b76144 in slave_read (ioc=0x555557a38680, condition=G_IO_IN,
opaque=0x5555584dd600) at ../hw/virtio/vhost-user.c:1726
#28 0x0000555555c797a5 in qio_channel_fd_source_dispatch
(source=0x555556f1a530, callback=0x555555b75f86 <slave_read>,
user_data=0x5555584dd600) at ../io/channel-watch.c:84
#29 0x00007ffff554895d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#30 0x00007ffff5548d18 in g_main_context_iterate.isra () at
/lib64/libglib-2.0.so.0
#31 0x00007ffff5549042 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#32 0x0000555555b72de7 in vhost_user_read (dev=0x5555584dd600,
msg=0x7fffffffc420) at ../hw/virtio/vhost-user.c:413
#33 0x0000555555b754b1 in vhost_user_get_u64 (dev=0x5555584dd600, request=40,
u64=0x7fffffffc6e0) at ../hw/virtio/vhost-user.c:1349
#34 0x0000555555b758ff in vhost_user_get_status (dev=0x5555584dd600,
status=0x7fffffffc713 "W\020") at ../hw/virtio/vhost-user.c:1474
#35 0x0000555555b75967 in vhost_user_add_status (dev=0x5555584dd600, status=7
'\a') at ../hw/virtio/vhost-user.c:1488
#36 0x0000555555b78bf6 in vhost_user_dev_start (dev=0x5555584dd600,
started=true) at ../hw/virtio/vhost-user.c:2758
#37 0x0000555555b709ad in vhost_dev_start (hdev=0x5555584dd600,
vdev=0x555557b965d0, vrings=false) at ../hw/virtio/vhost.c:1988
#38 0x000055555584291c in vhost_net_start_one (net=0x5555584dd600,
dev=0x555557b965d0) at ../hw/net/vhost_net.c:271
#39 0x0000555555842f1e in vhost_net_start (dev=0x555557b965d0,
ncs=0x555557bc09e0, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:412
#40 0x0000555555b1bf61 in virtio_net_vhost_status (n=0x555557b965d0, status=15
'\017') at ../hw/net/virtio-net.c:311
#41 0x0000555555b1c20c in virtio_net_set_status (vdev=0x555557b965d0, status=15
'\017') at ../hw/net/virtio-net.c:392
#42 0x0000555555b1ed04 in virtio_net_handle_mq (n=0x555557b965d0, cmd=0 '\000',
iov=0x555556c7ef50, iov_cnt=1) at ../hw/net/virtio-net.c:1497
#43 0x0000555555b1eef0 in virtio_net_handle_ctrl_iov (vdev=0x555557b965d0,
in_sg=0x555556a09880, in_num=1, out_sg=0x555556a09890, out_num=1) at
../hw/net/virtio-net.c:1534
#44 0x0000555555b1efe9 in virtio_net_handle_ctrl (vdev=0x555557b965d0,
vq=0x7fffc04ac140) at ../hw/net/virtio-net.c:1557
#45 0x0000555555b63776 in virtio_queue_notify_vq (vq=0x7fffc04ac140) at
../hw/virtio/virtio.c:2249
#46 0x0000555555b669dc in virtio_queue_host_notifier_read (n=0x7fffc04ac1b4) at
../hw/virtio/virtio.c:3529
#47 0x0000555555e3f458 in aio_dispatch_handler (ctx=0x555556a016c0,
node=0x7ffd8800e430) at ../util/aio-posix.c:369
#48 0x0000555555e3f613 in aio_dispatch_handlers (ctx=0x555556a016c0) at
../util/aio-posix.c:412
#49 0x0000555555e3f669 in aio_dispatch (ctx=0x555556a016c0) at
../util/aio-posix.c:422
#50 0x0000555555e585de in aio_ctx_dispatch (source=0x555556a016c0,
callback=0x0, user_data=0x0) at ../util/async.c:321
#51 0x00007ffff554895d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#52 0x0000555555e5abea in glib_pollfds_poll () at ../util/main-loop.c:295
#53 0x0000555555e5ac64 in os_host_main_loop_wait (timeout=0) at
../util/main-loop.c:318
#54 0x0000555555e5ad69 in main_loop_wait (nonblocking=0) at
../util/main-loop.c:604
#55 0x00005555559693de in qemu_main_loop () at ../softmmu/runstate.c:731
#56 0x00005555556e7c06 in qemu_default_main () at ../softmmu/main.c:37
#57 0x00005555556e7c3c in main (argc=71, argv=0x7fffffffcda8) at
../softmmu/main.c:48
-----Original Message-----
From: Maxime Coquelin <[email protected]>
Sent: Thursday, January 12, 2023 5:26 PM
To: Laurent Vivier <[email protected]>
Cc: [email protected]; Peter Maydell <[email protected]>; Yajun Wu
<[email protected]>; Parav Pandit <[email protected]>; Michael S. Tsirkin
<[email protected]>
Subject: Re: [PULL v4 76/83] vhost-user: Support vhost_dev_start
External email: Use caution opening links or attachments
Hi Laurent,
On 1/11/23 10:50, Laurent Vivier wrote:
> On 1/9/23 11:55, Michael S. Tsirkin wrote:
>> On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote:
>>> Hi,
>>>
>>> it seems this patch breaks vhost-user with DPDK.
>>>
>>> See
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbu
>>> gzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D2155173&data=05%7C01%7Cyajun
>>> w%40nvidia.com%7C47e6e0fabd044383fd3308daf47f0253%7C43083d15727340c1
>>> b7db39efd9ccc17a%7C0%7C0%7C638091123577559319%7CUnknown%7CTWFpbGZsb3
>>> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
>>> %7C3000%7C%7C%7C&sdata=1pjChYTKHVmBoempNitiZHBdrlPIMFjKoD6FeOVSay0%3
>>> D&reserved=0
>>>
>>> it seems QEMU doesn't receive the expected commands sequence:
>>>
>>> Received unexpected msg type. Expected 22 received 40 Fail to update
>>> device iotlb Received unexpected msg type. Expected 40 received 22
>>> Received unexpected msg type. Expected 22 received 11 Fail to update
>>> device iotlb Received unexpected msg type. Expected 11 received 22
>>> vhost VQ 1 ring restore failed: -71: Protocol error (71) Received
>>> unexpected msg type. Expected 22 received 11 Fail to update device
>>> iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ
>>> 0 ring restore failed: -71: Protocol error (71) unable to start
>>> vhost net: 71: falling back on userspace virtio
>>>
>>> It receives VHOST_USER_GET_STATUS (40) when it expects
>>> VHOST_USER_IOTLB_MSG (22) and VHOST_USER_IOTLB_MSG when it expects
>>> VHOST_USER_GET_STATUS.
>>> and VHOST_USER_GET_VRING_BASE (11) when it expect
>>> VHOST_USER_GET_STATUS and so on.
>>>
>>> Any idea?
We only have a single thread on DPDK side to handle Vhost-user requests, it
will read a request, handle it and reply to it. Then it reads the next one,
etc... So I don't think it is possible to mix request replies order on DPDK
side.
Maybe there are two threads concurrently sending requests on QEMU side?
Regards,
Maxime
>>> Thanks,
>>> Laurent
>>
>>
>> So I am guessing it's coming from:
>>
>> if (msg.hdr.request != request) {
>> error_report("Received unexpected msg type. Expected %d
>> received %d",
>> request, msg.hdr.request);
>> return -EPROTO;
>> }
>>
>> in process_message_reply and/or in vhost_user_get_u64.
>>
>>
>>> On 11/7/22 23:53, Michael S. Tsirkin wrote:
>>>> From: Yajun Wu <[email protected]>
>>>>
>>>> The motivation of adding vhost-user vhost_dev_start support is to
>>>> improve backend configuration speed and reduce live migration VM
>>>> downtime.
>>>>
>>>> Today VQ configuration is issued one by one. For virtio net with
>>>> multi-queue support, backend needs to update RSS (Receive side
>>>> scaling) on every rx queue enable. Updating RSS is time-consuming
>>>> (typical time like 7ms).
>>>>
>>>> Implement already defined vhost status and message in the vhost
>>>> specification [1].
>>>> (a) VHOST_USER_PROTOCOL_F_STATUS
>>>> (b) VHOST_USER_SET_STATUS
>>>> (c) VHOST_USER_GET_STATUS
>>>>
>>>> Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK
>>>> for device start and reset(0) for device stop.
>>>>
>>>> On reception of the DRIVER_OK message, backend can apply the needed
>>>> setting only once (instead of incremental) and also utilize
>>>> parallelism on enabling queues.
>>>>
>>>> This improves QEMU's live migration downtime with vhost user
>>>> backend implementation by great margin, specially for the large
>>>> number of VQs of 64 from 800 msec to 250 msec.
>>>>
>>>> [1]
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fq
>>>> emu-project.gitlab.io%2Fqemu%2Finterop%2Fvhost-user.html&data=05%7C
>>>> 01%7Cyajunw%40nvidia.com%7C47e6e0fabd044383fd3308daf47f0253%7C43083
>>>> d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638091123577559319%7CUnknown%
>>>> 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
>>>> CJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YPbrFRJA92KLLwADMUDvwBt%2Fme2Ef
>>>> GZuVANOmXH5pic%3D&reserved=0
>>>>
>>>> Signed-off-by: Yajun Wu <[email protected]>
>>>> Acked-by: Parav Pandit <[email protected]>
>>>> Message-Id: <[email protected]>
>>>> Reviewed-by: Michael S. Tsirkin <[email protected]>
>>>> Signed-off-by: Michael S. Tsirkin <[email protected]>
>>
>> Probably easiest to debug from dpdk side.
>> Does the problem go away if you disable the feature
>> VHOST_USER_PROTOCOL_F_STATUS in dpdk?
>
> Maxime could you help to debug this?
>
> Thanks,
> Laurent
>