On 07.03.2018 12:22, Kirill Tkhai wrote: > On 06.03.2018 19:50, Eric Dumazet wrote: >> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote: >>> After unshare test kworker hangs for ages: >>> >>> $ while :; do unshare -n true; done & >>> >>> $ perf report <kworker> >>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k] >>> cleanup_net >>> cleanup_net >>> - ops_exit_list.isra.9 >>> - 85,68% igmp6_net_exit >>> - 53,31% sock_release >>> - inet_release >>> - 25,33% rawv6_close >>> - ip6mr_sk_done >>> + 23,38% synchronize_rcu >>> >>> Keep in mind, this perf report shows the time a function was >>> executing, and >>> it does not show the time, it was sleeping. But it's easy to imagine, >>> how >>> much it is sleeping, if synchronize_rcu() execution takes the most >>> time. >>> Top shows the kworker R time is less then 1%. >>> >>> This happen, because of in ip6mr_sk_done() we do too many >>> synchronize_rcu(), >>> even for the sockets, that are not referenced in mr_table, and which >>> are not >>> need it. So, the progress of kworker becomes very slow. >>> >>> The patch introduces apparent solution, and it makes ip6mr_sk_done() >>> to skip >>> synchronize_rcu() for sockets, that are not need that. After the >>> patch, >>> kworker becomes able to warm the cpu up as expected. >>> >>> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com> >>> --- >>> net/ipv6/ip6mr.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c >>> index 2a38f9de45d3..290a8d0d5eac 100644 >>> --- a/net/ipv6/ip6mr.c >>> +++ b/net/ipv6/ip6mr.c >>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk) >>> } >>> } >>> rtnl_unlock(); >>> - synchronize_rcu(); >>> + >>> + if (!err) >>> + synchronize_rcu(); >>> >> >> >> But... what is this synchronize_rcu() doing exactly ? >> >> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c >> >> ("ip6mr: Make mroute_sk rcu-based") >> >> Typically on a delete, the synchronize_rcu() would be needed before >> freeing the deleted object. >> >> But nowadays we have better way : SOCK_RCU_FREE > > Hm. I'm agree with you. This is hot path, and mroute sockets created from > userspace > will delay userspace tasks close() and exit(). Since there may be many such > sockets, > we may get a zombie task, which can't be reaped for ages. This slows down the > system > directly. > > Fix for pernet_operations works, but we need generic solution instead. > > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says: > > ip6mr: Make mroute_sk rcu-based > > In ipmr the mr_table socket is handled under RCU. Introduce the same > for ip6mr. > > There is no pointing to improvements it invents, or to the problem it solves. > The description > looks like a cleanup. It's too expensive cleanup, if it worsens the > performance a hundred > times. > > Can't we simply revert it?! > > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by > Eric)? > > We actually use rcu_dereference() in ip6mr_cache_report() only. The only user > of dereference > is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in > inet_sock_destruct().
+ So this change should be safe. > Thanks, > Kirill >