Niklas Hambüchen <m...@nh2.me> wrote: > I'm sending this to netdev@vger.kernel.org even though > http://vger.kernel.org/lkml/ still suggests linux-...@vger.kernel.org, > because the latter seems to be inactive since 2011 and full of spam, and I > got "unresolvable address" for it. Perhaps somebody should update the page > that recommends it. > Nevertheless, please let me know if here is the wrong place.
This problem is known; I asked for test feedback on this patch but never got a response: netfilter: nf_nat: return the same reply tuple for matching CTs It is possible that two concurrent packets originating from the same socket of a connection-less protocol (e.g. UDP) can end up having different IP_CT_DIR_REPLY tuples which results in one of the packets being dropped. To illustrate this, consider the following simplified scenario: 1. No DNAT/SNAT/MASQUEARADE rules are installed, but the nf_nat module is loaded. 2. Packet A and B are sent at the same time from two different threads via the same UDP socket which hasn't been used before (=no CT has been created before). Both packets have the same IP_CT_DIR_ORIGINAL tuple. 3. CT of A has been created and confirmed, afterwards get_unique_tuple is called for B. Because IP_CT_DIR_REPLY tuple (the inverse of the IP_CT_DIR_ORIGINAL tuple) is already taken by the A's confirmed CT (nf_nat_used_tuple finds it), get_unique_tuple calls UDP's unique_tuple which returns a different IP_CT_DIR_REPLY tuple (usually with src port = 1024) 4. B's CT cannot get confirmed in __nf_conntrack_confirm due to the found IP_CT_DIR_ORIGINAL tuple of A and the different IP_CT_DIR_REPLY tuples, thus the packet B gets dropped. This patch modifies nf_conntrack_tuple_taken so it doesn't consider colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal. Then, at insert time, either clash resolution is possible (new packet has the existing/older conntrack assigned to it), or it has to be dropped. diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 741b533148ba..07847a612adf 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1007,6 +1007,22 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, } if (nf_ct_key_equal(h, tuple, zone, net)) { + /* If the origin tuples are identical, we can ignore + * this clashing entry: they refer to the same flow. + * Do not apply nat clash resolution in this case and + * let nf_ct_resolve_clash() deal with this. + * + * This can happen with UDP in particular, e.g. when + * more than one packet is sent from same socket in + * different threads. + * + * We would now mangle our entry and would then have to + * discard it at conntrack confirm time. + */ + if (nf_ct_tuple_equal(&ignored_conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) + continue; + NF_CT_STAT_INC_ATOMIC(net, found); rcu_read_unlock(); return 1;