Fabiano Rosas <[email protected]> writes: > Thomas Huth <[email protected]> writes: > >> On 18/07/2023 14.55, Milan Zamazal wrote: >>> Thomas Huth <[email protected]> writes: >>> >>>> On 11/07/2023 01.02, Michael S. Tsirkin wrote: >>>>> From: Milan Zamazal <[email protected]> >>>>> We don't have a virtio-scmi implementation in QEMU and only support >>>> >>>>> a >>>>> vhost-user backend. This is very similar to virtio-gpio and we add the >>>>> same >>>>> set of tests, just passing some vhost-user messages over the control >>>>> socket. >>>>> Signed-off-by: Milan Zamazal <[email protected]> >>>>> Acked-by: Thomas Huth <[email protected]> >>>>> Message-Id: <[email protected]> >>>>> Reviewed-by: Michael S. Tsirkin <[email protected]> >>>>> Signed-off-by: Michael S. Tsirkin <[email protected]> >>>>> --- >>>>> tests/qtest/libqos/virtio-scmi.h | 34 ++++++ >>>>> tests/qtest/libqos/virtio-scmi.c | 174 +++++++++++++++++++++++++++++++ >>>>> tests/qtest/vhost-user-test.c | 44 ++++++++ >>>>> MAINTAINERS | 1 + >>>>> tests/qtest/libqos/meson.build | 1 + >>>>> 5 files changed, 254 insertions(+) >>>>> create mode 100644 tests/qtest/libqos/virtio-scmi.h >>>>> create mode 100644 tests/qtest/libqos/virtio-scmi.c >>>> >>>> Hi! >>>> >>>> I'm seeing some random failures with this new scmi test, so far only >>>> on non-x86 systems, e.g.: >>>> >>>> https://app.travis-ci.com/github/huth/qemu/jobs/606246131#L4774 >>>> >>>> It also reproduces on a s390x host here, but only if I run "make check >>>> -j$(nproc)" - if I run the tests single-threaded, the qos-test passes >>>> there. Seems like there is a race somewhere in this test? >>> >>> Hmm, it's basically the same as virtio-gpio.c test, so it should be OK. >>> Is it possible that the two tests (virtio-gpio.c & virtio-scmi.c) >>> interfere with each other in some way? Is there possibly a way to >>> serialize them to check? >> >> I think within one qos-test, the sub-tests are already run >> serialized. But there might be multiple qos-tests running in >> parallel, e.g. one for the aarch64 target and one for the ppc64 >> target. And indeed, I can reproduce the problem on my x86 laptop by >> running this in one terminal window: >> >> for ((x=0;x<1000;x++)); do \ >> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \ >> G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \ >> QTEST_QEMU_BINARY=./qemu-system-ppc64 \ >> MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \ >> tests/qtest/qos-test -p \ >> >> /ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile >> \ >> || break ; \ >> done >> >> And this in another terminal window at the same time: >> >> for ((x=0;x<1000;x++)); do \ >> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \ >> G_TEST_DBUS_DAEMON=.tests/dbus-vmstate-daemon.sh \ >> QTEST_QEMU_BINARY=./qemu-system-aarch64 \ >> MALLOC_PERTURB_=188 QTEST_QEMU_IMG=./qemu-img \ >> tests/qtest/qos-test -p \ >> >> /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile >> \ >> || break ; \ >> done >> >> After a while, the aarch64 test broke with: >> >> /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-scmi-pci/vhost-user-scmi/vhost-user-scmi-tests/scmi/read-guest-mem/memfile: >> qemu-system-aarch64: Failed to set msg fds. >> qemu-system-aarch64: Failed to set msg fds. >> qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument >> (22) >> qemu-system-aarch64: Failed to set msg fds. >> qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument >> (22) >> qemu-system-aarch64: Failed to set msg fds. >> qemu-system-aarch64: vhost_set_vring_call failed 22 >> qemu-system-aarch64: Failed to set msg fds. >> qemu-system-aarch64: vhost_set_vring_call failed 22 >> qemu-system-aarch64: Failed to write msg. Wrote -1 instead of 20. >> qemu-system-aarch64: Failed to set msg fds. >> qemu-system-aarch64: vhost VQ 0 ring restore failed: -22: Invalid argument >> (22) >> qemu-system-aarch64: Failed to set msg fds. >> qemu-system-aarch64: vhost VQ 1 ring restore failed: -22: Invalid argument >> (22) >> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: >> msix_unset_vector_notifiers: Assertion >> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' >> failed. >> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected >> QEMU death from signal 6 (Aborted) (core dumped) > > If it helps,
It helps a lot, thank you! > it looks like msix_unset_vector_notifiers is being called twice, once > from vu_scmi_set_status() and another from vu_scmi_disconnect(): Interesting. Usually, vu_scmi_stop is called only once, which explains why the test regularly passes. Both the vu_scmi_stop callers have a check protecting from duplicate vu_scmi_stop calls but it's perhaps not fully reliable. I can see vhost-user-gpio has an extra protection. I'll post a patch adding a similar thing, hopefully it will fix the problem. > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_disconnect <- > vu_scmi_event > chr_be_event > qemu_chr_be_event > tcp_chr_disconnect_locked > tcp_chr_write > qemu_chr_write_buffer > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_set_status <- > virtio_set_status > virtio_vmstate_change > vm_state_notify > do_vm_stop > vm_shutdown > qemu_cleanup
