On Tue, Nov 14, 2017 at 10:24 AM, Shaohua Li <s...@kernel.org> wrote: > On Wed, Nov 08, 2017 at 09:44:51AM -0800, Tom Herbert wrote: >> On Fri, Aug 18, 2017 at 3:27 PM, David Miller <da...@davemloft.net> wrote: >> > From: Martin KaFai Lau <ka...@fb.com> >> > Date: Fri, 18 Aug 2017 13:51:36 -0700 >> > >> >> It seems like that middle box specifically drops TCP_RST if it >> >> does not know anything about this flow. Since the flowlabel of the >> >> TCP_RST >> >> (sent in TW state) is always different, it always lands to a different >> >> middle >> >> box. All of these TCP_RST cannot be delivered. >> > >> > This really is illegal behavior. The flow label is not a flow _KEY_ >> > by any definition whatsoever. >> > >> > Flow labels are an optimization, not a determinant for flow matching >> > particularly for proper TCP state processing. >> > >> > I'd rather you invest all of this energy getting that vendor to fix >> > their kit. >> > >> We're now seeing several router vendors recommending people to not use >> flow labels for ECMP hashing. This is precisely because when a flow >> label changes, network devices that maintain state (firewalls, NAT, >> load balancers) can't deal with packets being rerouted so connections >> are dropped. Unfortunately, the need for packets of a flow to always >> follow the same path has become an implicit requirement that I think >> we need follow at least as the default behavior. >> >> Martin: is there any change you could resurrect these patches? In >> order to solve the general problem of making routing consistent, I >> believe we want to keep sk_tx_hash consistent for the connection from >> which a consistent flow label can be derived. To avoid the overhead of >> a hash field in sk_common, maybe we could initially set a connection >> hash to a five-tuple hash for a flow instead of a random value? So in >> TW state the consistent hash can be computed on the fly. > > Hi Tom, > Do we really need to use the five-tupe hash? There are several places using > current random hash, which looks more lightweight. To fix issue, we only need > to make sure reset packet include the correct flowlabel. Like what my previous > patch did, we can set tw->tw_flowlabel in tcp_time_wait based on txhash and > use > it reset packet. In this way we can use the random hash and not add extra > field > in sock. > Shaohua,
But that patch discards the full txhash in TW. So it's not just a problem with the flow label. sk_tx_hash can also be used for route selection in ECMP, port selection we're doing tunneling, etc. The general solution should maintains tx_hash or be able to reconstruct it in any state, flow label fix is a point solution. Thanks, Tom > Thanks, > Shaohua