Hi Stefano, thanks for the answer!
On 2026-01-28 18:27, Stefano Brivio wrote: > On Wed, 28 Jan 2026 14:06:11 +0100 > Juraj Marcin <[email protected]> wrote: > > > Hi Stefano, > > > > On 2026-01-27 19:21, Stefano Brivio wrote: > > > [Cc'ing Laurent and David] > > > > > > On Tue, 27 Jan 2026 15:03:06 +0100 > > > Juraj Marcin <[email protected]> wrote: > > > > > > > During switchover there is a period during which both source and > > > > destination side VMs are paused. During this period, all network packets > > > > are still routed to the source side, but it will never process them. > > > > Once the destination resumes, it is not aware of these packets and they > > > > are lost. This can cause packet loss in unreliable protocols and > > > > extended delays due to retransmission in reliable protocols. > > > > > > > > This series resolves this problem by caching packets received once the > > > > source VM pauses and then passing and injecting them on the destination > > > > side. This feature is implemented in the last patch. The caching and > > > > injecting is implemented using network filter interface and should work > > > > with any backend with vhost=off, but only TAP network backend was > > > > explicitly tested. > > > > > > I haven't had a chance to try this change with passt(1) yet (the > > > backend can be enabled using "-net passt" or by starting it > > > separately). > > > > > > Given that passt implements migration on its own (in deeper detail in > > > some sense, as TCP connections are preserved if IP addresses match), I > > > wonder if it this might affect or break it somehow. > > > > > > Did you perhaps have some thoughts about that already? > > > > I'm aware of passt migrating its state and passt-repair, but I also > > haven't tested it as I couldn't get passt-repair to work. > > Oops. Let me know if you're hitting any specific error I could look > into. I tried it using this documentation [1] I found earlier, however, it wouldn't work when migrating on the same host as I expected from it. The destination passt process fails to get the port the outside TCP server is communicating with and I see the connection still as established with the source passt process. This is the specific error message from the destination passt process: Flow 0 (TCP connection): Failed to connect migrated socket: Cannot assign requested address [1]: https://www.qemu.org/docs/master/system/devices/net.html#example-of-migration-of-a-guest-on-the-same-host > > I plan anyway to try out your changes but I might need a couple of days > before I find the time. > > > Does it also handle other protocols, or just preserves TCP connections? > > Layer-4-wise, we have an internal representation of UDP "flows" > (observed flows of packets for which we preserve the same source port > mapping, with timeouts) and we had a vague idea of migrating those as > well, but it's debatable where there's any benefit from it. > > At Layer 2 and 3, we migrate IP and MAC addresses we observed from the > guest: > > > https://passt.top/passt/tree/migrate.c?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n31 > > so that we have ARP and NDP resolution, as well as any NAT > mapping working right away as needed. > > For completeness, this is the TCP context we migrate instead: > > > https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n108 > > https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n154 > > > The main focus of this feature are protocols that cannot handle packet > > loss on their own in environments where IP address is preserved (and > > thus also TCP connections). > > Well, strictly speaking, TCP handles packet loss, that's actually the > main reason behind it. I guess this is to improve throughput and avoid > latency spikes or retransmissions that could be avoided? Sorry, I actually meant that all connections are preserved. The main goal is to prevent losses with protocols other than TCP when possible, which was requested by our Solution Architects. Possible improved TCP throughput due to avoided retransmissions is just a side effect of that. > > > So, mainly tap/bridge, with the idea that > > other network backends could also benefit from it. However, if it causes > > problems with other backends, I could limit it just to tap. > > I couldn't quite figure out yet if it's beneficial, useless, or > harmless for passt. With passt, what happens without your > implementation is: > > 1. guest pauses > > 2. the source instance of passt starts migrating, meaning that sockets > are frozen one by one, their receiving and sending queues dumped > > 3. pending queues are sent to the target instance of passt, which opens > sockets as refills queues as needed > > 4. target guest resumes and will get any traffic that was received by > the source instance of passt between 1. and 2. > > Right now there's still a Linux kernel issue we observed (see also > https://pad.passt.top/p/TcpRepairTodo, that's line 4 there) which might > cause segments to be received (and acknowledged!) on sockets of the > source instance of passt for a small time period *after* we freeze them > with TCP_REPAIR (that is, TCP_REPAIR doesn't really freeze the queue). > > I'm currently working on a proper fix for that. Until then, point 2. > above isn't entirely accurate (but it only happens if you hammer it > with traffic generators, it's not really visible otherwise). > > With your implementation, I guess: > > 1. guest pauses > > 2. the source instance of passt starts migrating, meaning that sockets > are frozen one by one, their receiving and sending queues dumped > > 2a. any data received by QEMU after 1. will be stored and forwarded to > the target later. But passt at this point prevents the guest from > getting any data, so there should be no data involved > > 3. pending queues are sent to the target instance of passt, which opens > sockets as refills queues as needed > > 3a. the target guest gets the data from 2a. As long as there's no data > (as I'm assuming), there should be no change. If there's data coming > in at this point, we risk that sequences don't match anymore? I'm not > sure > > 4. target guest resumes and will *also* get any traffic that was received > by the source instance of passt between 1. and 2. > > So if my assumption from 2a. above holds, it should be useless, but > harmless. > > Would your implementation help with the kernel glitch we're currently > observing? I don't think so, because your implementation would only play > a role between passt and QEMU, and we don't have issues there. > > Well, it would be good to try things out. Other than that, unless I'm > missing something, your implementation should probably be skipped for > passt for simplicity, and also to avoid negatively affecting downtime. I agree with skipping passt in such case, although, I haven't perceived any effect on downtime. Cached network packets are sent after the destination resumes, so that the network knows about new location of the VM and the source shouldn't receive any more packets intended for it. > > Note that you can also use passt without "-net passt" (that's actually > quite recent) but with a tap back-end. Migration is only supported with > vhost-user enabled though, and as far as I understand your implementation > is disabled in that case? As of now it is disabled in that case as network filters don't support vhost. > > -- > Stefano -- Juraj Marcin
