On Wed, Nov 08, 2017 at 09:44:51AM -0800, Tom Herbert wrote: > On Fri, Aug 18, 2017 at 3:27 PM, David Miller <da...@davemloft.net> wrote: > > From: Martin KaFai Lau <ka...@fb.com> > > Date: Fri, 18 Aug 2017 13:51:36 -0700 > > > >> It seems like that middle box specifically drops TCP_RST if it > >> does not know anything about this flow. Since the flowlabel of the TCP_RST > >> (sent in TW state) is always different, it always lands to a different > >> middle > >> box. All of these TCP_RST cannot be delivered. > > > > This really is illegal behavior. The flow label is not a flow _KEY_ > > by any definition whatsoever. > > > > Flow labels are an optimization, not a determinant for flow matching > > particularly for proper TCP state processing. > > > > I'd rather you invest all of this energy getting that vendor to fix > > their kit. > > > We're now seeing several router vendors recommending people to not use > flow labels for ECMP hashing. This is precisely because when a flow > label changes, network devices that maintain state (firewalls, NAT, > load balancers) can't deal with packets being rerouted so connections > are dropped. Unfortunately, the need for packets of a flow to always > follow the same path has become an implicit requirement that I think > we need follow at least as the default behavior. > > Martin: is there any change you could resurrect these patches? In > order to solve the general problem of making routing consistent, I > believe we want to keep sk_tx_hash consistent for the connection from > which a consistent flow label can be derived. To avoid the overhead of > a hash field in sk_common, maybe we could initially set a connection > hash to a five-tuple hash for a flow instead of a random value? So in > TW state the consistent hash can be computed on the fly.
Hi Tom, Do we really need to use the five-tupe hash? There are several places using current random hash, which looks more lightweight. To fix issue, we only need to make sure reset packet include the correct flowlabel. Like what my previous patch did, we can set tw->tw_flowlabel in tcp_time_wait based on txhash and use it reset packet. In this way we can use the random hash and not add extra field in sock. Thanks, Shaohua