On 17.11.2017 21:52, Eric W. Biederman wrote: > Kirill Tkhai <ktk...@virtuozzo.com> writes: > >> On 15.11.2017 19:31, Eric W. Biederman wrote: >>> Kirill Tkhai <ktk...@virtuozzo.com> writes: >>> >>>> On 15.11.2017 12:51, Kirill Tkhai wrote: >>>>> On 15.11.2017 06:19, Eric W. Biederman wrote: >>>>>> Kirill Tkhai <ktk...@virtuozzo.com> writes: >>>>>> >>>>>>> On 14.11.2017 21:39, Cong Wang wrote: >>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktk...@virtuozzo.com> >>>>>>>> wrote: >>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags, >>>>>>>>> >>>>>>>>> get_user_ns(user_ns); >>>>>>>>> >>>>>>>>> - rv = mutex_lock_killable(&net_mutex); >>>>>>>>> + rv = down_read_killable(&net_sem); >>>>>>>>> if (rv < 0) { >>>>>>>>> net_free(net); >>>>>>>>> dec_net_namespaces(ucounts); >>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags, >>>>>>>>> list_add_tail_rcu(&net->list, &net_namespace_list); >>>>>>>>> rtnl_unlock(); >>>>>>>>> } >>>>>>>>> - mutex_unlock(&net_mutex); >>>>>>>>> + up_read(&net_sem); >>>>>>>>> if (rv < 0) { >>>>>>>>> dec_net_namespaces(ucounts); >>>>>>>>> put_user_ns(user_ns); >>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work) >>>>>>>>> list_replace_init(&cleanup_list, &net_kill_list); >>>>>>>>> spin_unlock_irq(&cleanup_list_lock); >>>>>>>>> >>>>>>>>> - mutex_lock(&net_mutex); >>>>>>>>> + down_read(&net_sem); >>>>>>>>> >>>>>>>>> /* Don't let anyone else find us. */ >>>>>>>>> rtnl_lock(); >>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work) >>>>>>>>> list_for_each_entry_reverse(ops, &pernet_list, list) >>>>>>>>> ops_free_list(ops, &net_exit_list); >>>>>>>>> >>>>>>>>> - mutex_unlock(&net_mutex); >>>>>>>>> + up_read(&net_sem); >>>>>>>> >>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(), >>>>>>>> given that ops_exit_list() is called on error path of setup_net() too, >>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its >>>>>>>> own lock. Not sure if this breaks any existing user. >>>>>>> >>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace, >>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which >>>>>>> have a problem with that. If they exist, they are hidden and not clear >>>>>>> seen. >>>>>>> The pernet operations in general do not touch someone else's memory. >>>>>>> If suddenly there is one, KASAN should show it after a while. >>>>>> >>>>>> Certainly the use of hash tables shared between multiple network >>>>>> namespaces would count. I don't rembmer how many of these we have but >>>>>> there used to be quite a few. >>>>> >>>>> Could you please provide an example of hash tables, you mean? >>>> >>>> Ah, I see, it's dccp_hashinfo etc. >> >> JFI, I've checked dccp_hashinfo, and it seems to be safe. >> >>> >>> The big one used to be the route cache. With resizable hash tables >>> things may be getting better in that regard. >> >> I've checked some fib-related things, and wasn't able to find that. >> Excuse me, could you please clarify, if it's an assumption, or >> there is exactly a problem hash table, you know? Could you please >> point it me more exactly, if it's so. > > Two things. > 1) Hash tables are one case I know where we access data from multiple > network namespaces. As such it can not be asserted that is no > possibility for problems. > > 2) The responsible way to handle this is one patch for each set of > methods explaining why those methods are safe to run in parallel. > > That ensures there is opportunity for review and people are going > slowly enough that they actually look at these issues. > > The reason I want to see this broken up is that at 200ish sets of > methods it is too much to review all at once.
Ok, it's possible to split the changes in 400 patches, but there is a problem with three-state (no compile, module, built-in) drivers. Git bisect won't work anyway. Please see the description of the problem in cover message "[PATCH RFC 00/25] Replacing net_mutex with rw_semaphore" I sent today. > I completely agree that odds are that this can be made safe and that it > is mostly likely already safe in practically every instance. My guess > would be that if there are problems that need to be addressed they > happen in one or two places and we need to find them. If possible I > don't want to find them after the code has shipped in a stable release. Kirill