> >>> After unshare test kworker hangs for ages: > >>> > >>> $ while :; do unshare -n true; done & > >>> > >>> $ perf report <kworker> > >>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] > >>> cleanup_net > >>> cleanup_net > >>> - ops_exit_list.isra.9 > >>> - 85,68% igmp6_net_exit > >>> - 53,31% sock_release > >>> - inet_release > >>> - 25,33% rawv6_close > >>> - ip6mr_sk_done > >>> + 23,38% synchronize_rcu > >>> > >>> Keep in mind, this perf report shows the time a function was > >>> executing, and > >>> it does not show the time, it was sleeping. But it's easy to imagine, > >>> how > >>> much it is sleeping, if synchronize_rcu() execution takes the most > >>> time. > >>> Top shows the kworker R time is less then 1%. > >>> > >>> This happen, because of in ip6mr_sk_done() we do too many > >>> synchronize_rcu(), > >>> even for the sockets, that are not referenced in mr_table, and which > >>> are not > >>> need it. So, the progress of kworker becomes very slow. > >>> > >>> The patch introduces apparent solution, and it makes ip6mr_sk_done() > >>> to skip > >>> synchronize_rcu() for sockets, that are not need that. After the > >>> patch, > >>> kworker becomes able to warm the cpu up as expected. > >>> > >>> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com> > >>> --- > >>> net/ipv6/ip6mr.c | 4 +++- > >>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c > >>> index 2a38f9de45d3..290a8d0d5eac 100644 > >>> --- a/net/ipv6/ip6mr.c > >>> +++ b/net/ipv6/ip6mr.c > >>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk) > >>> } > >>> } > >>> rtnl_unlock(); > >>> - synchronize_rcu(); > >>> + > >>> + if (!err) > >>> + synchronize_rcu(); > >>> > >> > >> > >> But... what is this synchronize_rcu() doing exactly ? > >> > >> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c > >> > >> ("ip6mr: Make mroute_sk rcu-based") > >> > >> Typically on a delete, the synchronize_rcu() would be needed before > >> freeing the deleted object. > >> > >> But nowadays we have better way : SOCK_RCU_FREE > > > > Hm. I'm agree with you. This is hot path, and mroute sockets created from > userspace > > will delay userspace tasks close() and exit(). Since there may be many such > sockets, > > we may get a zombie task, which can't be reaped for ages. This slows down > the system > > directly. > > > > Fix for pernet_operations works, but we need generic solution instead. > > > > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says: > > > > ip6mr: Make mroute_sk rcu-based > > > > In ipmr the mr_table socket is handled under RCU. Introduce the same > > for ip6mr. > > > > There is no pointing to improvements it invents, or to the problem it > > solves. > The description > > looks like a cleanup. It's too expensive cleanup, if it worsens the > performance a hundred > > times. > > > > Can't we simply revert it?! > > > > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested > by Eric)?
Sorry, failed to notice ip6mr_sk_done() is called unconditionally from rawv6_close(). But even with your suggested fix it should be ~resolved [as only sockets used for ip6mr would reach the sync]. Or are you claiming there's still some performance hit even with your suggested change? > > > > We actually use rcu_dereference() in ip6mr_cache_report() only. The only > user of dereference > > is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in > inet_sock_destruct(). > > + So this change should be safe. I might have misunderstood the comment from commit 4c9687098f24 ("ipmr: RCU conversion of mroute_sk") when writing this; Thought comment regarding ip_ra_destroy() meant that for the IPv6 case we DO have to make sure there's a grace-period before destroying the socket. > > > Thanks, > > Kirill > >