Thomas Huth <[email protected]> writes:

> On 18/07/2023 14.55, Milan Zamazal wrote:
>> Thomas Huth <[email protected]> writes:
>> 
>>> On 11/07/2023 01.02, Michael S. Tsirkin wrote:
>>>> From: Milan Zamazal <[email protected]>
>>>> We don't have a virtio-scmi implementation in QEMU and only support
>>>
>>>> a
>>>> vhost-user backend.  This is very similar to virtio-gpio and we add the 
>>>> same
>>>> set of tests, just passing some vhost-user messages over the control 
>>>> socket.
>>>> Signed-off-by: Milan Zamazal <[email protected]>
>>>> Acked-by: Thomas Huth <[email protected]>
>>>> Message-Id: <[email protected]>
>>>> Reviewed-by: Michael S. Tsirkin <[email protected]>
>>>> Signed-off-by: Michael S. Tsirkin <[email protected]>
>>>> ---
>>>>    tests/qtest/libqos/virtio-scmi.h |  34 ++++++
>>>>    tests/qtest/libqos/virtio-scmi.c | 174 +++++++++++++++++++++++++++++++
>>>>    tests/qtest/vhost-user-test.c    |  44 ++++++++
>>>>    MAINTAINERS                      |   1 +
>>>>    tests/qtest/libqos/meson.build   |   1 +
>>>>    5 files changed, 254 insertions(+)
>>>>    create mode 100644 tests/qtest/libqos/virtio-scmi.h
>>>>    create mode 100644 tests/qtest/libqos/virtio-scmi.c
>>>
>>>   Hi!
>>>
>>> I'm seeing some random failures with this new scmi test, so far only
>>> on non-x86 systems, e.g.:
>>>
>>>   https://app.travis-ci.com/github/huth/qemu/jobs/606246131#L4774
>>>
>>> It also reproduces on a s390x host here, but only if I run "make check
>>> -j$(nproc)" - if I run the tests single-threaded, the qos-test passes
>>> there. Seems like there is a race somewhere in this test?
>> 
>> Hmm, it's basically the same as virtio-gpio.c test, so it should be OK.
>> Is it possible that the two tests (virtio-gpio.c & virtio-scmi.c)
>> interfere with each other in some way?  Is there possibly a way to
>> serialize them to check?
>
> I think within one qos-test, the sub-tests are already run serialized. But 
> there might be multiple qos-tests running in parallel, e.g. one for the 
> aarch64 target and one for the ppc64 target. And indeed, I can reproduce the 
> problem on my x86 laptop by running this in one terminal window:
>
> for ((x=0;x<1000;x++)); do \
>   QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \
>   G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \
>   QTEST_QEMU_BINARY=./qemu-system-ppc64 \
>   MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \
>   tests/qtest/qos-test -p \
>   
> /ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile
>  \
>   || break ; \
> done
>
> And this in another terminal window at the same time:
>
> for ((x=0;x<1000;x++)); do \
>   QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \
>   G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \
>   QTEST_QEMU_BINARY=./qemu-system-aarch64 \
>   MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \
>   tests/qtest/qos-test -p \
>   
> /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile
>  \
>   || break ; \
> done
>
> After a while, the aarch64 test broke with:
>
> /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile:
>  qemu-system-aarch64: Failed to set msg fds.
> qemu-system-aarch64: Failed to set msg fds.
> qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument 
> (22)
> qemu-system-aarch64: Failed to set msg fds.
> qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument 
> (22)
> qemu-system-aarch64: Failed to set msg fds.
> qemu-system-aarch64: vhost_set_vring_call failed 22
> qemu-system-aarch64: Failed to set msg fds.
> qemu-system-aarch64: vhost_set_vring_call failed 22
> qemu-system-aarch64: Failed to write msg. Wrote -1 instead of 20.
> qemu-system-aarch64: Failed to set msg fds.
> qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument 
> (22)
> qemu-system-aarch64: Failed to set msg fds.
> qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument 
> (22)
> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: 
> msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && 
> dev->msix_vector_release_notifier' failed.
> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death 
> from signal 6 (Aborted) (core dumped)

If it helps,

it looks like msix_unset_vector_notifiers is being called twice, once
from vu_scmi_set_status() and another from vu_scmi_disconnect():

msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_disconnect   <-
vu_scmi_event
chr_be_event
qemu_chr_be_event
tcp_chr_disconnect_locked
tcp_chr_write
qemu_chr_write_buffer

msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_set_status   <-
virtio_set_status
virtio_vmstate_change
vm_state_notify
do_vm_stop
vm_shutdown
qemu_cleanup

Reply via email to