"Maciej S. Szmigiero" <[email protected]> writes:
> On 29.08.2024 02:41, Fabiano Rosas wrote:
>> "Maciej S. Szmigiero" <[email protected]> writes:
>>
>>> From: "Maciej S. Szmigiero" <[email protected]>
>>>
>>> A new function multifd_queue_device_state() is provided for device to queue
>>> its state for transmission via a multifd channel.
>>>
>>> Signed-off-by: Maciej S. Szmigiero <[email protected]>
>>> ---
>>> include/migration/misc.h | 4 ++
>>> migration/meson.build | 1 +
>>> migration/multifd-device-state.c | 99 ++++++++++++++++++++++++++++++++
>>> migration/multifd-nocomp.c | 6 +-
>>> migration/multifd-qpl.c | 2 +-
>>> migration/multifd-uadk.c | 2 +-
>>> migration/multifd-zlib.c | 2 +-
>>> migration/multifd-zstd.c | 2 +-
>>> migration/multifd.c | 65 +++++++++++++++------
>>> migration/multifd.h | 29 +++++++++-
>>> 10 files changed, 184 insertions(+), 28 deletions(-)
>>> create mode 100644 migration/multifd-device-state.c
>>>
>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>> index bfadc5613bac..7266b1b77d1f 100644
>>> --- a/include/migration/misc.h
>>> +++ b/include/migration/misc.h
>>> @@ -111,4 +111,8 @@ bool migration_in_bg_snapshot(void);
>>> /* migration/block-dirty-bitmap.c */
>>> void dirty_bitmap_mig_init(void);
>>>
>>> +/* migration/multifd-device-state.c */
>>> +bool multifd_queue_device_state(char *idstr, uint32_t instance_id,
>>> + char *data, size_t len);
>>> +
>>> #endif
>>> diff --git a/migration/meson.build b/migration/meson.build
>>> index 77f3abf08eb1..00853595894f 100644
>>> --- a/migration/meson.build
>>> +++ b/migration/meson.build
>>> @@ -21,6 +21,7 @@ system_ss.add(files(
>>> 'migration-hmp-cmds.c',
>>> 'migration.c',
>>> 'multifd.c',
>>> + 'multifd-device-state.c',
>>> 'multifd-nocomp.c',
>>> 'multifd-zlib.c',
>>> 'multifd-zero-page.c',
>>> diff --git a/migration/multifd-device-state.c
>>> b/migration/multifd-device-state.c
>>> new file mode 100644
>>> index 000000000000..c9b44f0b5ab9
>>> --- /dev/null
>>> +++ b/migration/multifd-device-state.c
>>> @@ -0,0 +1,99 @@
>>> +/*
>>> + * Multifd device state migration
>>> + *
>>> + * Copyright (C) 2024 Oracle and/or its affiliates.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>>> later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/lockable.h"
>>> +#include "migration/misc.h"
>>> +#include "multifd.h"
>>> +
>>> +static QemuMutex queue_job_mutex;
>>> +
>>> +static MultiFDSendData *device_state_send;
>>> +
>>> +size_t multifd_device_state_payload_size(void)
>>> +{
>>> + return sizeof(MultiFDDeviceState_t);
>>> +}
>>
>> This will not be necessary because the payload size is the same as the
>> data type. We only need it for the special case where the MultiFDPages_t
>> is smaller than the total ram payload size.
>
> I know - I just wanted to make the API consistent with the one RAM
> handler provides since these multifd_send_data_alloc() calls are done
> just once per migration - it isn't any kind of a hot path.
>
I think the array at the end of MultiFDPages_t should be considered
enough of a hack that we might want to keep anything related to it
outside of the interface. Other clients shouldn't have to think about
that at all.
>>> @@ -397,20 +404,16 @@ bool multifd_send(MultiFDSendData **send_data)
>>>
>>> p = &multifd_send_state->params[i];
>>> /*
>>> - * Lockless read to p->pending_job is safe, because only multifd
>>> - * sender thread can clear it.
>>> + * Lockless RMW on p->pending_job_preparing is safe, because only
>>> multifd
>>> + * sender thread can clear it after it had seen p->pending_job
>>> being set.
>>> + *
>>> + * Pairs with qatomic_store_release() in multifd_send_thread().
>>> */
>>> - if (qatomic_read(&p->pending_job) == false) {
>>> + if (qatomic_cmpxchg(&p->pending_job_preparing, false, true) ==
>>> false) {
>>
>> What's the motivation for this change? It would be better to have it in
>> a separate patch with a proper justification.
>
> The original RFC patch set used dedicated device state multifd channels.
>
> Peter and other people wanted this functionality removed, however this caused
> a performance (downtime) regression.
>
> One of the things that seemed to help mitigate this regression was making
> the multifd channel selection more fair via this change.
>
> But I can split out it to a separate commit in the next patch set version and
> then see what performance improvement it currently brings.
Yes, better to have it separate if anything for documentation of the
rationale.