Ilan Tayari <il...@mellanox.com> wrote: > I debugged a little the regression I told you about the other day... > > Steps and Symptoms: > 1. Set up a host-to-host IPSec tunnel (or transport, doesn't matter) > 2. Ping over IPSec, or do something to populate the pcpu cache > 3. Join a MC group, then leave MC group > 4. Try to ping again using same CPU as before -> traffic doesn't egress the > machine at all > > If trying from another CPU (with clean cache), it pings well. > If clearing the pcpu cache, it works well again.
Yes, I think i see the problem, thanks for debugging this. I dropped the stale_bundle() check vs. rfc, that was a stupid thing to do because that is what would detect this.... Does this help? diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1818,7 +1818,8 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols, xdst->num_pols == num_pols && !xfrm_pol_dead(xdst) && memcmp(xdst->pols, pols, - sizeof(struct xfrm_policy *) * num_pols) == 0) { + sizeof(struct xfrm_policy *) * num_pols) == 0 && + xfrm_bundle_ok(xdst)) { dst_hold(&xdst->u.dst); return xdst; }