On 17.11.2017 21:52, Eric W. Biederman wrote:
> Kirill Tkhai <ktk...@virtuozzo.com> writes:
> 
>> On 15.11.2017 19:31, Eric W. Biederman wrote:
>>> Kirill Tkhai <ktk...@virtuozzo.com> writes:
>>>
>>>> On 15.11.2017 12:51, Kirill Tkhai wrote:
>>>>> On 15.11.2017 06:19, Eric W. Biederman wrote:
>>>>>> Kirill Tkhai <ktk...@virtuozzo.com> writes:
>>>>>>
>>>>>>> On 14.11.2017 21:39, Cong Wang wrote:
>>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktk...@virtuozzo.com> 
>>>>>>>> wrote:
>>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>>
>>>>>>>>>         get_user_ns(user_ns);
>>>>>>>>>
>>>>>>>>> -       rv = mutex_lock_killable(&net_mutex);
>>>>>>>>> +       rv = down_read_killable(&net_sem);
>>>>>>>>>         if (rv < 0) {
>>>>>>>>>                 net_free(net);
>>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>>                 list_add_tail_rcu(&net->list, &net_namespace_list);
>>>>>>>>>                 rtnl_unlock();
>>>>>>>>>         }
>>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>>> +       up_read(&net_sem);
>>>>>>>>>         if (rv < 0) {
>>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>>>                 put_user_ns(user_ns);
>>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>>         list_replace_init(&cleanup_list, &net_kill_list);
>>>>>>>>>         spin_unlock_irq(&cleanup_list_lock);
>>>>>>>>>
>>>>>>>>> -       mutex_lock(&net_mutex);
>>>>>>>>> +       down_read(&net_sem);
>>>>>>>>>
>>>>>>>>>         /* Don't let anyone else find us. */
>>>>>>>>>         rtnl_lock();
>>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>>         list_for_each_entry_reverse(ops, &pernet_list, list)
>>>>>>>>>                 ops_free_list(ops, &net_exit_list);
>>>>>>>>>
>>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>>> +       up_read(&net_sem);
>>>>>>>>
>>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(),
>>>>>>>> given that ops_exit_list() is called on error path of setup_net() too,
>>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its
>>>>>>>> own lock. Not sure if this breaks any existing user.
>>>>>>>
>>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace,
>>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which
>>>>>>> have a problem with that. If they exist, they are hidden and not clear 
>>>>>>> seen.
>>>>>>> The pernet operations in general do not touch someone else's memory.
>>>>>>> If suddenly there is one, KASAN should show it after a while.
>>>>>>
>>>>>> Certainly the use of hash tables shared between multiple network
>>>>>> namespaces would count.  I don't rembmer how many of these we have but
>>>>>> there used to be quite a few.
>>>>>
>>>>> Could you please provide an example of hash tables, you mean?
>>>>
>>>> Ah, I see, it's dccp_hashinfo etc.
>>
>> JFI, I've checked dccp_hashinfo, and it seems to be safe.
>>
>>>
>>> The big one used to be the route cache.  With resizable hash tables
>>> things may be getting better in that regard.
>>
>> I've checked some fib-related things, and wasn't able to find that.
>> Excuse me, could you please clarify, if it's an assumption, or
>> there is exactly a problem hash table, you know? Could you please
>> point it me more exactly, if it's so.
> 
> Two things.
> 1) Hash tables are one case I know where we access data from multiple
>    network namespaces.  As such it can not be asserted that is no
>    possibility for problems.
> 
> 2) The responsible way to handle this is one patch for each set of
>    methods explaining why those methods are safe to run in parallel.
> 
>    That ensures there is opportunity for review and people are going
>    slowly enough that they actually look at these issues.
> 
> The reason I want to see this broken up is that at 200ish sets of
> methods it is too much to review all at once.

Ok, it's possible to split the changes in 400 patches, but there is
a problem with three-state (no compile, module, built-in) drivers.
Git bisect won't work anyway. Please see the description of the problem
in cover message "[PATCH RFC 00/25] Replacing net_mutex with rw_semaphore"
I sent today.
 
> I completely agree that odds are that this can be made safe and that it
> is mostly likely already safe in practically every instance.    My guess
> would be that if there are problems that need to be addressed they
> happen in one or two places and we need to find them.  If possible I
> don't want to find them after the code has shipped in a stable release.

Kirill

Reply via email to