On Sat 02 Mar 2019 at 00:08, Cong Wang <xiyou.wangc...@gmail.com> wrote: > On Thu, Feb 28, 2019 at 6:53 AM Vlad Buslov <vla...@mellanox.com> wrote: >> >> >> On Wed 27 Feb 2019 at 23:03, Cong Wang <xiyou.wangc...@gmail.com> wrote: >> > On Tue, Feb 26, 2019 at 8:10 AM Vlad Buslov <vla...@mellanox.com> wrote: >> >> >> >> >> >> On Tue 26 Feb 2019 at 00:15, Cong Wang <xiyou.wangc...@gmail.com> wrote: >> >> > On Mon, Feb 25, 2019 at 7:45 AM Vlad Buslov <vla...@mellanox.com> wrote: >> >> >> >> >> >> Function tc_dump_chain() obtains and releases block->lock on each >> >> >> iteration >> >> >> of its inner loop that dumps all chains on block. Outputting chain >> >> >> template >> >> >> info is fast operation so locking/unlocking mutex multiple times is an >> >> >> overhead when lock is highly contested. Modify tc_dump_chain() to only >> >> >> obtain block->lock once and dump all chains without releasing it. >> >> >> >> >> >> Signed-off-by: Vlad Buslov <vla...@mellanox.com> >> >> >> Suggested-by: Cong Wang <xiyou.wangc...@gmail.com> >> >> > >> >> > Thanks for the followup! >> >> > >> >> > Isn't it similar for __tcf_get_next_proto() in tcf_chain_dump()? >> >> > And for tc_dump_tfilter()? >> >> >> >> Not really. These two dump all tp filters and not just a template, which >> >> is O(n) on number of filters and can be slow because it calls hw offload >> >> API for each of them. Our typical use-case involves periodic filter dump >> >> (to update stats) while multiple concurrent user-space threads are >> >> updating filters, so it is important for them to be able to execute in >> >> parallel. >> > >> > Hmm, but if these are read-only, you probably don't even need a >> > mutex, you can just use RCU read lock to protect list iteration >> > and you still can grab the refcnt in the same way. >> >> That is how it worked in my initial implementation. However, it doesn't >> work with hw offloads because driver callbacks can sleep. > > Hmm? You drop RCU read lock after grabbing the refcnt, > right? If so what's the problem with sleeping?
Okay, I misunderstood your suggestion. In tc_dump_tfilter() we can't use RCU in __tcf_get_next_chain() because chain reference counters are not atomic and require protection of block->lock. __tcf_get_next_proto() requires chain->filter_chain_lock because it checks 'deleting' flag besides taking reference to tp.