On 2025/09/01 1:38, Arun Menon wrote:
Hi,

On Mon, Sep 01, 2025 at 01:04:40AM +0900, Akihiko Odaki wrote:
On 2025/09/01 0:45, Arun Menon wrote:
Hi Akihiko,
Thanks for the review.

On Sat, Aug 30, 2025 at 02:58:05PM +0900, Akihiko Odaki wrote:
On 2025/08/30 5:01, Arun Menon wrote:
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that qemu_loadvm_state() must report an error
in errp, in case of failure.

When postcopy live migration runs, the device states are loaded by
both the qemu coroutine process_incoming_migration_co() and the
postcopy_ram_listen_thread(). Therefore, it is important that the
coroutine also reports the error in case of failure, with
error_report_err(). Otherwise, the source qemu will not display
any errors before going into the postcopy pause state.

Reviewed-by: Marc-AndrĂ© Lureau <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Arun Menon <[email protected]>
---
    migration/migration.c |  9 +++++----
    migration/savevm.c    | 30 ++++++++++++++++++------------
    migration/savevm.h    |  2 +-
    3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 
10c216d25dec01f206eacad2edd24d21f00e614c..c6768d88f45c870c7fad9b9957300766ff69effc
 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -881,7 +881,7 @@ process_incoming_migration_co(void *opaque)
                          MIGRATION_STATUS_ACTIVE);
        mis->loadvm_co = qemu_coroutine_self();
-    ret = qemu_loadvm_state(mis->from_src_file);
+    ret = qemu_loadvm_state(mis->from_src_file, &local_err);
        mis->loadvm_co = NULL;
        trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
@@ -908,7 +908,8 @@ process_incoming_migration_co(void *opaque)
        }
        if (ret < 0) {
-        error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
+        error_prepend(&local_err, "load of migration failed: %s: ",
+                      strerror(-ret));
            goto fail;
        }
@@ -924,13 +925,13 @@ fail:
        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                          MIGRATION_STATUS_FAILED);
        migrate_set_error(s, local_err);
-    error_free(local_err);
+    error_report_err(local_err);

This is problematic because it results in duplicate error reports when
!mis->exit_on_error; in that case the query-migrate QMP command reports the
error and this error reporting is redundant.

If I comment this change, then all of the errors propagated up to now, using
error_setg() will not be reported. This is the place where it is finally 
reported,
when qemu_loadvm_state() fails. In other words, all the error_reports() we 
removed
from all the files, replacing them with error_setg(), will finally be reported 
here
using error_report_err().

My understanding of the code without these two changes is:
- If the migrate-incoming QMP command is used with false as
   exit-on-error, this function will not report the error but
   the query-migrate QMP command will report the error.
- Otherwise, this function reports the error.

With my limited experience in testing, I have a question,
So there are 2 scenarios,
1. running the virsh migrate command on the source host. Something like the 
following,
   virsh -c 'qemu:///system' migrate --live --verbose --domain guest-vm 
--desturi qemu+ssh://10.6.120.20/system
   OR for postcopy-ram,
   virsh migrate guest-vm --live qemu+ssh://10.6.120.20/system --verbose 
--postcopy --timeout 10 --timeout-postcopy

2. Using QMP commands, performing a migration from source to destination.
   Running something like the following on the destination:
   {
     "execute": "migrate-incoming",
     "arguments": {
       "uri": "tcp:127.0.0.1:7777",
       "exit-on-error": false
     }
   }
   {
     "execute": "migrate-incoming",
     "arguments": {
       "uri": "tcp:127.0.0.1:7777",
       "exit-on-error": false
     }
   }
   and the somthing like the following on source:
   {
     "execute": "migrate",
     "arguments": {
       "uri": "tcp:127.0.0.1:7777"
     }
   }
   {"execute" : "query-migrate"}

In 1, previously, the user used to get an error message on migration failure.
This was because there were error_report() calls in all of the files.
Now that they are replaced with error_setg() and the error is stored in errp,
we need to display that using error_report_err(). Hence I introduced an 
error_report_err()
call in the fail section.

In 2, we have 2 QMP sessions, one for the source and another for the 
destination.
The QMP command migrate will be issued on the source, and the errp will be set.
I did not understand the part where the message will be displayed because of the
error_report_err() call. I did not see such a message on failure scenario on 
both
the sessions.
If the user wants to check for errors, then the destination qemu will not exit
(exit-on-error = false ) and we can retrieve it using {"execute" : 
"query-migrate"}

Aren't the 2 scenarios different by nature?

In 1, doesn't libvirt query the error with query-migrate and print it?

In any case, it would be nice if you describe how libvirt interacts with QEMU in 1.

Regards,
Akihiko Odaki

Reply via email to