On Tue, Nov 14, 2017 at 04:53:33PM +0300, Kirill Tkhai wrote: > Curently mutex is used to protect pernet operations list. It makes > cleanup_net() to execute ->exit methods of the same operations set, > which was used on the time of ->init, even after net namespace is > unlinked from net_namespace_list. > > But the problem is it's need to synchronize_rcu() after net is removed > from net_namespace_list(): > > Destroy net_ns: > cleanup_net() > mutex_lock(&net_mutex) > list_del_rcu(&net->list) > synchronize_rcu() <--- Sleep there for ages > list_for_each_entry_reverse(ops, &pernet_list, list) > ops_exit_list(ops, &net_exit_list) > list_for_each_entry_reverse(ops, &pernet_list, list) > ops_free_list(ops, &net_exit_list) > mutex_unlock(&net_mutex) > > This primitive is not fast, especially on the systems with many processors > and/or when preemptible RCU is enabled in config. So, all the time, while > cleanup_net() is waiting for RCU grace period, creation of new net namespaces > is not possible, the tasks, who makes it, are sleeping on the same mutex: > > Create net_ns: > copy_net_ns() > mutex_lock_killable(&net_mutex) <--- Sleep there for ages > > The solution is to convert net_mutex to the rw_semaphore. Then, > pernet_operations::init/::exit methods, modifying the net-related data, > will require down_read() locking only, while down_write() will be used > for changing pernet_list. > > This gives signify performance increase, like you may see below. There > is measured sequential net namespace creation in a cycle, in single > thread, without other tasks (single user mode): > > 1)int main(int argc, char *argv[]) > { > unsigned nr; > if (argc < 2) { > fprintf(stderr, "Provide nr iterations arg\n"); > return 1; > } > nr = atoi(argv[1]); > while (nr-- > 0) { > if (unshare(CLONE_NEWNET)) { > perror("Can't unshare"); > return 1; > } > } > return 0; > } > > Origin, 100000 unshare(): > 0.03user 23.14system 1:39.85elapsed 23%CPU > > Patched, 100000 unshare(): > 0.03user 67.49system 1:08.34elapsed 98%CPU > > 2)for i in {1..10000}; do unshare -n bash -c exit; done
Hi Kirill, This mutex has another role. You know that net namespaces are destroyed asynchronously, and the net mutex gurantees that a backlog will be not big. If we have something in backlog, we know that it will be handled before creating a new net ns. As far as I remember net namespaces are created much faster than they are destroyed, so with this changes we can create a really big backlog, can't we? There was a discussion a few month ago: https://lists.onap.org/pipermail/containers/2016-October/037509.html > > Origin: > real 1m24,190s > user 0m6,225s > sys 0m15,132s Here you measure time of creating and destroying net namespaces. > > Patched: > real 0m18,235s (4.6 times faster) > user 0m4,544s > sys 0m13,796s But here you measure time of crearing namespaces and you know nothing when they will be destroyed. Thanks, Andrei