Hello Avihai,
On 10/20/24 15:01, Avihai Horon wrote:
When the VM is shut down vfio_vmstate_change/_prepare() are called to
transition the VFIO device state to STOP. They are called after
migration_shutdown() and thus, by this time, the migration object is
already freed (more specifically, MigrationState->qemu_file_lock is
already destroyed).
In this case, if there is an error in vfio_vmstate_change/_prepare(), it
calls migration_file_set_error() which tries to lock the already
destroyed MigrationState->qemu_file_lock, leading to the following
assert:
qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl:
Assertion `mutex->initialized' failed.
Fix this by not setting migration file error in the shut down flow.
Fixes: 20c64c8a51a4 ("migration: migration_file_set_error")
Signed-off-by: Avihai Horon <avih...@nvidia.com>
---
hw/vfio/migration.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 992dc3b102..1c44b036ea 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -783,6 +783,25 @@ static const SaveVMHandlers savevm_vfio_handlers = {
/* ---------------------------------------------------------------------- */
+static void vfio_vmstate_change_error_report(int ret, Error *err,
+ RunState state)
+{
+ if (state == RUN_STATE_SHUTDOWN) {
+ /*
+ * If VM is being shut down, migration object might have already been
+ * freed, so just report the error.
+ */
+ error_report_err(err);
+ return;
+ }
+
+ /*
+ * Migration should be aborted in this case, but vm_state_notify()
+ * currently does not support reporting failures.
+ */
+ migration_file_set_error(ret, err);
+}
This seems correct, but testing the machine's runtime state to
work around the fact that 'current_migration' is not safe to
use reveals a flaw.
In vfio, migration_is_setup_or_active() is unsafe. So are the
calls to vfio_set_migration_error().
This comment seems in contradiction with the migration code :
/* When we add fault tolerance, we could have several
migrations at once. For now we don't need to add
dynamic creation of migration */
Why is 'current_migration' allocated and destroyed today ?
Could we replace it with a singleton and switch the state to
MIGRATION_STATUS_NONE instead of destroying it ?
If not, should 'current_migration' be set to NULL in
migration_shutdown() and its value tested before accessing
to any of its fields, in a thread safe manner if necessary.
Thanks,
C.