Leonardo Bras <[email protected]> wrote:
> When zero-copy-send is enabled, each loop iteration of the
> multifd_send_thread will calls for qio_channel_write_*() twice: The first
> one for sending the header without zero-copy flag and the second one for
> sending the memory pages, with zero-copy flag enabled.
>
> This ends up calling two syscalls per loop iteration, where one should be
> enough.
>
> Also, since the behavior for non-zero-copy write is synchronous, and the
> behavior for zero-copy write is asynchronous, it ends up interleaving
> synchronous and asynchronous writes, hurting performance that could
> otherwise be improved.
>
> The choice of sending the header without the zero-copy flag in a separated
> write happened because the header memory area would be reused in the next
> write, so it was almost certain to have changed before the kernel could
> send the packet.
>
> To send the packet with zero-copy, create an array of header area instead
> of a single one, and use a different header area after each write. Also,
> flush the sending queue after all the headers have been used.
>
> To avoid adding a zero-copy conditional in multifd_send_fill_packet(),
> add a packet parameter (the packet that should be filled). This way it's
> simpler to pick which element of the array will be used as a header.
>
> Suggested-by: Juan Quintela <[email protected]>
> Signed-off-by: Leonardo Bras <[email protected]>
>
> + if (use_zero_copy_send) {
> + p->packet_idx = (p->packet_idx + 1) % HEADER_ARR_SZ;
> +
> + if (!p->packet_idx && (multifd_zero_copy_flush(p->c) < 0)) {
> + break;
> + }
> + header = (void *)p->packet + p->packet_idx * p->packet_len;
Isn't this equivalent to?
header = &(p->packet[p->packet_idx]);
> for (i = 0; i < thread_count; i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
> + int j;
For new code you can:
> qemu_mutex_init(&p->mutex);
> qemu_sem_init(&p->sem, 0);
> @@ -940,9 +940,13 @@ int multifd_save_setup(Error **errp)
> p->pages = multifd_pages_init(page_count);
> p->packet_len = sizeof(MultiFDPacket_t)
> + sizeof(uint64_t) * page_count;
> - p->packet = g_malloc0(p->packet_len);
> - p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
> - p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> + p->packet = g_malloc0_n(HEADER_ARR_SZ, p->packet_len);
> + for (j = 0; j < HEADER_ARR_SZ ; j++) {
for (int j = 0; j < HEADER_ARR_SZ ; j++) {
> + MultiFDPacket_t *packet = (void *)p->packet + j * p->packet_len;
> + packet->magic = cpu_to_be32(MULTIFD_MAGIC);
> + packet->version = cpu_to_be32(MULTIFD_VERSION);
Can't you use here:
packet[j].magic = cpu_to_be32(MULTIFD_MAGIC);
packet[j].version = cpu_to_be32(MULTIFD_VERSION);
And call it a day?
The rest is fine for me. Thanks for the effort.
Later, Juan.