Re: [PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive

Stefano Brivio Wed, 20 May 2026 01:17:25 -0700

On Wed, 20 May 2026 00:27:38 -0700
Eric Dumazet <[email protected]> wrote:

> On Wed, May 20, 2026 at 12:24 AM Laurent Vivier <[email protected]> wrote:
> >
> > On 5/20/26 06:39, Eric Dumazet wrote:  
> > > On Tue, May 19, 2026 at 7:03 PM Jakub Kicinski <[email protected]> wrote:  
> > >>
> > >> On Mon, 18 May 2026 20:34:22 +0200 Stefano Brivio wrote:  
> > >>> Stefano Brivio (2):
> > >>>    tcp: Don't accept data when socket is in repair mode  
> > >>
> > >> Not sure Eric is on board with this patch in the first place.
> > >> Sound like it's not the intended use case for REPAIR so IMO
> > >> it's up to TCP maintainers whether we want to support this.
> > >> And it's definitely not a Fix.  
> > >
> > > I am not on board. Only net-next material for sure.

I wasn't quite sure, I'll re-target it for net-next.

> > > While v2 is slightly better, we have the fundamental problem of adding
> > > stuff to TCP receive path for features that almost nobody use.
> > > (It took more than 13 years to finally complain about the race)
> > >
> > > This consumes Gigawats of power on our planet (and tons of CO2), given
> > > the trillions of packets processed every second.

Absolutely not what I want, I didn't consider that.

> > > Please at least add a static key as we did in:
> > >
> > > commit 020e71a3cf7f50c0f2c54cf2444067b76fe6d785
> > > Author: Eric Dumazet <[email protected]>
> > > Date:   Mon Oct 25 09:48:24 2021 -0700
> > >
> > >      ipv4: guard IP_MINTTL with a static key
> > >
> > >      RFC 5082 IP_MINTTL option is rarely used on hosts.

I will, indeed, thanks for providing the reference.

> > > But in my original feedback I pointed that we TCP_REPAIR should
> > > probably only insert a TCP socket into TPC established hash table
> > > only when it is ready to process incoming packets.

I looked into your original feedback, and replied, because I couldn't
understand how adding sockets to the hash table (they are already
there) would help fixing this.

Now I understand the confusion:

> > The problem is on the source side of the migration and I think removing the 
> > socket from
> > the ehash table would trigger RSTs, closing the connection on the remote 
> > side.  
> 
> RST is triggered anyway if TCP_REPAIR did not occur yet.

It's different than the problem Kuniyuki pointed to (with the link to
that proposed hack from January): there, you have a situation where the
socket doesn't exist yet because it hasn't been restored yet.

That's livepatching, on the same host / node, a different (new) usage.

But here (regardless of whether it's CRIU or passt) we're talking about
*migration* (CRIU: container, passt: VM) to a different host.

In this case, migration will be considered done / ready only after the
sockets on the *target node* are up and running. No traffic will be
routed to the target node before that point.

As long as we don't prematurely close the sockets on the source node,
there's no issue with RSTs anywhere (and we can also roll things back
on a failed migration). That's something we fixed in userspace without
bothering the kernel at all.

> This is why a netfilter solution is generally used to make sure no
> incoming packet is received until the whole state has been repaired.

See comment from Andrei here: the _main_ reason why CRIU (only by
default, though) sets up netfilter rules seems to be that it makes
things simpler *while dumping connections* in that specific case. A
separate namespace makes it straightforward.

That's not the case for passt: it doesn't need to touch netfilter at
all, and I think it's a very useful thing in terms of complexity,
security, and keeping the downtime to a minimum (other than, as Andrei
mentioned, making things generally safer).

-- 
Stefano

Re: [PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive

Reply via email to