Hi, sorry re-sending because I didn't have plain text mode enabled and message didn't arrive to the mailing list. Also dropping some mail addresses from RFC authors which probed to be unavailable anymore.
The problem is not caused by the host-wide rate limit. I analyzed the scenario with tcpdump and I see the challenge ACKs are being sent, but due to network conditions sometimes they can be lost. On top of that, the RST being sent from the client to the server after receiving the challenge ACK can also sometimes be lost. I even found the client/router had a problem with iptables setup and was actually not sending back RST for incoming TCP packets without an existing connection established (should be the one already closed before with the first RST sent by the client and which originated the challenge ACK). In my test scenario I'm using MultiPathTCP and I'm recreating (destroy with RST, then create with SYN) the subflow on one of the interfaces every aprox 5 seconds while uploading a big file. Due to network conditions and/or router stopping the RSTs, after less than 5 min, I have more than 32 subflow connections created for that MPTCP connection, which is the hardcoded maximum, and from that point I'm unable to create/use new subflows until some of them are closed, which can take quite a lot of time. After this patch is applied, the number of subflows is kind of stable at 4-5 subflows during the whole upload. It's still not 2 (one permanent in iface1 and the recreated one in iface2) because sometimes due to network problems the packet before the RST is lost and thus the RST which arrives is not equal to next expected seq number from the right edge of the SACK block. But still, it makes the situation quite better, specially from user point of view. Here's an example of the issue without my patch (challenge ACK is sent although the RST is in current place). It shows server acknowledging data with SACK in first line. Then, on 2nd line, client decides to terminate the connexion and uses his next SEQ number available. On 3rd line, server answers with the challenge ACK. Then no answer comes from that challenge ack and the TCP conn is left opened. 14202 73.086360 10.0.4.2 443 10.67.15.100 53755 TCP 94 [TCP Dup ACK 14192#4] 443→53755 [ACK] Seq=3551806992 Ack=1240653972 Win=1573120 Len=0 TSval=4022818 TSecr=1106789 SLE=1240836636 SRE=1240942836 SLE=1240770084 SRE=1240835220 14203 73.086363 10.67.15.100 53755 10.0.4.2 443 TCP 74 53755→443 [RST, ACK] Seq=1240942836 Ack=3551806992 Win=90240 Len=0 TSval=1106847 TSecr=4022536 14204 73.086368 10.0.4.2 443 10.67.15.100 53755 TCP 94 [TCP Dup ACK 14192#5] 443→53755 [ACK] Seq=3551806992 Ack=1240653972 Win=1573120 Len=0 TSval=4022818 TSecr=1106789 SLE=1240836636 SRE=1240942836 SLE=1240770084 SRE=1240835220 So, the main point in here is trying to improve the situation to close the connections and free resources in some specific cases without actually going pre RFC5961 state. That would mean when a RST is received, up to 4-5 SEQs are checked to match instead of 1. I didn't contact the authors of the RFC. I CC them in this e-mail. I hope that's the right thing to do in this case and that they don't mind it in case they want to follow the topic. I will have a look at packetdrill to try to reproduce it somehow there. On Tue, May 31, 2016 at 5:12 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Tue, 2016-05-31 at 13:38 +0200, Pau Espin Pedrol wrote: >> RFC 5961 advises to only accept RST packets containing a seq number >> matching the next expected seq number instead of the whole receive >> window in order to avoid spoofing attacks. >> >> However, this situation is not optimal in the case SACK is in use at the >> time the RST is sent. I recently run into a scenario in which packet >> losses were high while uploading data to a server, and userspace was >> willing to frequently terminate connections by sending a RST. In >> this case, the ACK sent on the receiver side is frozen waiting for a lost >> packet retransmission and a SACK block is used to let the client >> continue uploading data. At some point later on, the client sends the >> RST, which matches the next expected seq number of the SACK block on the >> receiver side which is going forward receiving data. >> >> In this scenario, as RFC 5961 defines, the SEQ doesn't match the frozen >> main ACK at receiver side and thus gets dropped and a challenge ACK is >> sent, which gets usually lost due to network conditions. The main >> consequence is that the connection stays alive for a while even if it >> made sense to accept the RST. This can get really bad if lots of >> connections like this one are created in few seconds, allocating all the >> resources of the server easily. >> >> From security point of view: the maximum number of SACK blocks for a TCP >> connection is limited to 4 due to options field maximum length, and that >> means we match at maximum against 5 seq numbers, which should make it >> still difficult for attackers to inject a valid RST message. >> >> This patch was tested in a 3.18 kernel and probed to improve the >> situation in the scenario described above. >> >> Signed-off-by: Pau Espin Pedrol <pau.es...@tessares.net> >> --- >> net/ipv4/tcp_input.c | 18 +++++++++++++++++- >> 1 file changed, 17 insertions(+), 1 deletion(-) >> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c >> index d6c8f4cd0..4727dc8 100644 >> --- a/net/ipv4/tcp_input.c >> +++ b/net/ipv4/tcp_input.c >> @@ -5159,6 +5159,7 @@ static bool tcp_validate_incoming(struct sock *sk, >> struct sk_buff *skb, >> const struct tcphdr *th, int syn_inerr) >> { >> struct tcp_sock *tp = tcp_sk(sk); >> + bool rst_seq_match = false; >> >> /* RFC1323: H1. Apply PAWS check first. */ >> if (tcp_fast_parse_options(skb, th, tp) && tp->rx_opt.saw_tstamp && >> @@ -5195,13 +5196,28 @@ static bool tcp_validate_incoming(struct sock *sk, >> struct sk_buff *skb, >> >> /* Step 2: check RST bit */ >> if (th->rst) { >> - /* RFC 5961 3.2 : >> + /* RFC 5961 3.2 (extended to match against SACK too if >> available): >> * If sequence number exactly matches RCV.NXT, then >> * RESET the connection >> * else >> * Send a challenge ACK >> */ >> if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) >> + rst_seq_match = true; >> + else if (tcp_is_sack(tp)) { >> + int this_sack; >> + struct tcp_sack_block *sp = tp->rx_opt.dsack ? >> + tp->duplicate_sack : >> tp->selective_acks; >> + >> + for (this_sack = 0; this_sack < tp->rx_opt.num_sacks; >> ++this_sack) { >> + if (TCP_SKB_CB(skb)->seq == >> sp[this_sack].end_seq) { >> + rst_seq_match = true; >> + break; >> + } >> + } >> + } >> + >> + if (rst_seq_match) >> tcp_reset(sk); >> else >> tcp_send_challenge_ack(sk, skb); >> -- >> 2.5.0 > > It looks like you want to seriously relax RFC 5961 ... > > Could you have a problem because of the host-wide RFC 5961 rate limit ? > > Have you contacted RFC authors ? > > If the peer sends the RST, presumably it should answer to the challenge > ACK right away, since it does not care of the SACK blocks anymore. > > A packetdrill test demonstrating the issue would be nice. > > Thanks. > > > -- Pau Espin Pedrol | R&D Engineer - External pau.es...@tessares.net | +32 487 43 36 50 Tessares SA | Hybrid Access Solutions www.tessares.net 6 Rue Louis de Geer, 1348 Louvain-la-Neuve, Belgium -- ------------------------------ DISCLAIMER. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.