Peter Xu <[email protected]> writes:
> On Wed, Aug 02, 2023 at 05:04:45PM -0300, Fabiano Rosas wrote:
>> Peter Xu <[email protected]> writes:
>>
>> >> + if (await_return_path_close_on_source(s)) {
>> >> + trace_migration_return_path_pause_err();
>> >> + return MIG_THR_ERR_FATAL;
>> >> + }
>> >
>> > I see that here on return path failures we'll bail out, and actually it's
>> > against the instinction (that when pause it should have failed, so it's
>> > weird why it's returning 0).
>> >
>> > So how about above suggestion, plus here we just call
>> > await_return_path_close_on_source(), without caring about the retval?
>>
>> So you are suggesting to remove the knowledge of the retry entirely from
>> the thread. It just reports the error and the postcopy_pause takes the
>> responsibility of ignoring it when we want to retry... It could be
>> clearer that way indeed.
>
> That error doesn't really important IMHO here, because the to-dst-file
> should have already errored out anyway.
>
> I just think it cleaner if we reset rp_error only until the new thread
> created.
>
ok
>>
>> It would trigger when a rp error happened that wasn't related to the
>> QEMUFile. If we go with your suggestion above, then this goes away.
>
> With your current patch where rp_error seems to be always reset when thread
> quit, if that's true then it'll 100% happen that this will not trigger.
>
> But yeah this is a trivial spot, feel free to choose the best if you plan
> to reorganize this patch a bit. Thanks.
My patch just resets the error when doing postcopy and the error is a
QEMUFile error. The header validation at the start of the loop could
still set rp_state.error and return without going through the postcopy
retry:
if (header_type >= MIG_RP_MSG_MAX ||
header_type == MIG_RP_MSG_INVALID) {
error_report("RP: Received invalid message 0x%04x length 0x%04x",
header_type, header_len);
mark_source_rp_bad(ms);
goto out;
}