Re: [PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function

Akihiko Odaki Sun, 03 Aug 2025 01:38:31 -0700

On 2025/08/02 2:27, Arun Menon wrote:

Hi Akihiko,
Thanks for the review.


On Fri, Aug 01, 2025 at 04:35:34PM +0900, Akihiko Odaki wrote:

On 2025/07/31 22:21, Arun Menon wrote:

Currently post_save hook is called without checking its return. If post_save
fails, we need to set an error and propagate it to the caller.

Since post_save hook is called regardless of whether there is a preceeding 
error,
it is possible that we have 2 distict errors, one from the preceeding function
call, and the other from the post_save call.

Return the latest error to the caller.


This needs to be explained better. This patch makes two changes on the
behavior when there are two errors:

1) Proceeding errors were propagated before, but they are now
    dismissed.
2) post_error() errors were dismissed before, but they are now
    propagated.

This message doesn't mention 1) at all. It does say 2) is necessary, but
does not explain why.


Please correct me if I am wrong, does the following look ok?

Currently post_save() hook is called without checking its return.
post_save() hooks, generally does the cleanup, and that is the reason why
we have been dismissing its failure.

We want to consider
   - the saving of the device state (save or subsection save) and
   - running the cleanup after
as an atomic operation. And that is why, post_save is called regardless
of whether there is a preceeding error.
This means that, it is possible that we have 2 distict errors, one from
the preceeding function and the other from the post_save() function.

This patch makes two changes on the behavior when there are two errors:

1) Preceeding errors were propagated before, but they are now
    dismissed if there is a new post_save() error.
2) post_save() errors were dismissed before, but they are now
    propagated.

We intend to extend the error propagation feature (saving erros in
Error object and return to the caller) to the post/pre
save/load hooks. Therefore, it is important for the user to know
the exact reason of failure in case any of these hooks fail.

====
I do not know much about the operations we do, or intend to do
using the post_save() call. But my guess is that we are going to
revert things or cleanup stuff, and if reverting/cleanup fails,
then the user should be notified about that.

All what you stated sounds correct to me and explains 2). But there isstill no reasoning for 1).


Previously I listed three possible design options:
> - propagate whatever error easier to do so
> - somehow combine errors
>   (languages that support exceptions often do this)
> - make sure that no error will happen when there is a proceeding error
>   by having two callbacks:
>   - One callback that is called only when there is no proceeding error
>     and can raise an error
>   - Another that is always called but cannot raise an error

https://lore.kernel.org/qemu-devel/[email protected]/

It will be a good reasoning if we can show propagating post_save()errors is easier (or more important) than propagating proceeding errors.

There are alternative options. "Making sure that no error will happenwhen there is a proceeding error by having two callbacks" is one ofthem. You wrote:

> We want to consider
>    - the saving of the device state (save or subsection save) and
>    - running the cleanup after
> as an atomic operation. And that is why, post_save is called
> regardless of whether there is a preceeding error.
> This means that, it is possible that we have 2 distict errors, one
> from the preceeding function and the other from the post_save()
> function.

But it may not be mandatory. In fact, I searched "post_save" in the codebase and found all implementations are for cleanup and always succeeds,which makes sense; most (if not all) cleanup operations like g_free()always succeeds in general.

So perhaps "making sure that no error will happen when there is aproceeding error" may be feasible, and it may not even require twocallbacks because post_save() always succeeds.


Regards,
Akihiko Odaki

Re: [PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function

Reply via email to