On 24.11.2016 09:47, Ido Schimmel wrote: > On Thu, Nov 24, 2016 at 12:04:57AM +0100, Hannes Frederic Sowa wrote: >> On 23.11.2016 20:53, Ido Schimmel wrote: >>> On Wed, Nov 23, 2016 at 06:47:03PM +0100, Hannes Frederic Sowa wrote: >>>> Hmm, I think you need to read the sequence counter under rtnl_lock to >>>> have an ordering with the rest of the updates to the RCU trie. Otherwise >>>> you don't know if the fib trie has the correct view regarding to the >>>> incoming notifications as a whole. This is also necessary during restarts. >>> >>> I spent quite a lot of time thinking about this specific issue, but I >>> couldn't convince myself that the read should be done under RTNL and I'm >>> not sure I understand your reasoning. Can you please elaborate? >>> >>> If, before each notification sent, we call atomic_inc() and then call >>> atomic_read() at the end, then how can we be tricked? >> >> The race I am suspecting to happen is: >> >> <CPU0> fib_register() >> >> <CPU1> delete route by notifier >> <CPU1> enqueue delete cmd into ordered queue >> >> <CPU0> starts dump >> <CPU0> sees deleted route by CPU1 because route not yet removed from RCU >> <CPU0> enqueues route for addition > > Yea, I missed this trivial case... My mind was fixed on problems that > could happen after the dump already started. :( > > Regarding your suggestion, I think the API will be more useful if we > don't bundle fib_register() and fib_dump() together. We can do the > following instead: > > 1) Sum 'fib_seq' (doesn't need to be atomic_t anymore) from all net > namespaces under RTNL
You anyway only support init_net, no? I didn't fully understood what you mean by sum? Using one for the whole system? We already have net->ipv4.rt_genid as a per-namespace routing change counter, have you looked at that? > 2) Dump FIB tables under RCU > 3) Do 1) again > 4) Compare results from 1) and 3) and retry (according to sysctl limit) > if results differ. Before each retry the module's callback (if passed) > will be invoked. > > Sounds OK? Ah, you want to sum up all the fib_seq from all namespaces. Now I got it. Not sure if that is such a good idea actually. It might make problems later on if offloading will maybe one day become a per-netns knob for the respective admins. But semantically it should work. If it turns out to be much easier than doing it per-netns, I think this approach should work. Bye, Hannes