Hi, I've Cced netdev, maybe other people have an opinion on this too.
On Thu, Nov 19, 2020 at 01:39:29PM -0800, Nic Dade wrote: > I've been investigating a problem which happens when I use IPsec > (strongswan in userspace), ESN, the default anti-replay window (32 > seqnums), on a multi-core CPU. The small window size isn't necessary, it > just makes it easier to reproduce. Using the same SA on multiple cores has synchronization problems anyway. We curently try to solve that on the protocol level: https://datatracker.ietf.org/doc/html/draft-pwouters-multi-sa-performance > > It looks like it's the same problem which was found and patched in commit > 3b59df46a4, but that commit only applies to async case, and doesn't quite > solve the problem when it is the CPU cores which are causing it. > > The trouble is that xfrm_replay_seqhi() is called twice for each packet, > once to decide what seqhi value to use for authentication, and a second > time to decide what seq num to advance to for anti-replay. Sometimes they > are not the same value. > > Here's what happens to cause this: two packets belonging to the same ESP SA > are being received, and the packet's seqnums are > window apart. The > packets pass through xfrm_input() which does the 1st call to > xfrm_replay_seqhi(). Once the packet and seqhi have been authenticated, > x->repl->advance() == xfrm_replay_advance_esn() is called. If the higher > seqnum packet gets to advance() first, then it moves the auti-replay window > forward. Then when the second packet arrives at advance(), the call > xfrm_replay_advance_esn() makes to xfrm_replay_seqhi() thinks the seqnum < > bottom means 32-bit seqnums wrapped, and it returns seqhi+1. > xfrm_replay_advance_esn() stores that away in the xfrm_replay_state_esn for > the future. From now until the ESP SA renews, or the sender gets to the > point where seqhi really did increment, all packets will fail > authentication. Well analyzed! Yes, that can happen if the packets are processed on different cpus. > > This is b/c the seqhi which was authenticated != the seqhi which advance() > believed to be correct. > > I think it would be safer if the authenticated seqhi computed in > xfrm_input() was passed to the advance() function as an argument, rather > than having advance() recompute seqhi. That would also fix the async case. > Or the non-async case also needs to recheck. Right, using the authenticated seqhi in the advance() function makes sense. We can do away the recheck() if that fixes the problem entirely. But I did not look into this problem for some years, so not absolutely sure if that is sufficient. > > Also it seems to be that calling xfrm_replay_seqhi() while not holding > x->lock (as xfrm_input() does) is not safe, since both x->seq_hi and x->seq > are used by xfrm_replay_seqhi(), and that can race with the updates to seq > and seq_hi in xfrm_replay_advance_esn() True, indeed. > ---------------------------------------------------------- > > Note I still have some mysteries. I do know for sure (from instrumenting > the code and reproducing) that the 2nd call to xfrm_replay_seqhi() does > result in a different answer than the 1st call, and that this happens when > the packets are processed simultaneously and out of order as described > above. My mystery is that my instrumentation also indicates it's the same > CPU core (by smp_processor_id()) processing the two packets, which I cannot > explain. That is odd. Should not happen if the packets are processed on the same cpu with a sync cipher.