On Wed 19 Dec 2018 at 04:27, Cong Wang <xiyou.wangc...@gmail.com> wrote: > On Mon, Dec 17, 2018 at 2:30 AM Jiri Pirko <j...@resnulli.us> wrote: >> >> Sun, Dec 16, 2018 at 07:52:18PM CET, xiyou.wangc...@gmail.com wrote: >> >On Sun, Dec 16, 2018 at 8:32 AM Vlad Buslov <vla...@mellanox.com> wrote: >> >> >> >> On Thu 13 Dec 2018 at 23:32, Cong Wang <xiyou.wangc...@gmail.com> wrote: >> >> > On Tue, Dec 11, 2018 at 2:19 AM Vlad Buslov <vla...@mellanox.com> wrote: >> >> >> >> >> >> As a part of the effort to remove dependency on rtnl lock, cls API is >> >> >> being >> >> >> converted to use fine-grained locking mechanisms instead of global rtnl >> >> >> lock. However, chain_head_change callback for ingress Qdisc is a >> >> >> sleeping >> >> >> function and cannot be executed while holding a spinlock. >> >> > >> >> > >> >> > Why does it have to be a spinlock not a mutex? >> >> > >> >> > I've read your cover letter and this changelog, I don't find any >> >> > answer. >> >> >> >> My initial implementation used mutex. However, it was changed to >> >> spinlock by Jiri's request during internal review. >> >> >> > >> >So what's the answer to my question? :) >> >> Yeah, my concern agains mutexes was that it would be needed to have one >> per every block and per every chain. I find it quite heavy and I believe >> it is better to use spinlock in those cases. This patch is a side effect >> of that. Do you think it would be better to have mutexes instead of >> spinlocks? > > My only concern with spinlock is we have to go async as we > can't block. This is almost always error-prone especially > when dealing with tcf block. I had to give up with spinlock > for idrinfo->lock, please take a look at: > > commit 95278ddaa15cfa23e4a06ee9ed7b6ee0197c500b > Author: Cong Wang <xiyou.wangc...@gmail.com> > Date: Tue Oct 2 12:50:19 2018 -0700 > > net_sched: convert idrinfo->lock from spinlock to a mutex > > > There are indeed some cases in kernel we do take multiple > mutex'es, for example, > > /* > * bd_mutex locking: > * > * mutex_lock(part->bd_mutex) > * mutex_lock_nested(whole->bd_mutex, 1) > */ > > So, how heavy are they comparing with spinlocks? > > Thanks.
Hi Cong, I did quick-and-dirty mutex-based implementation of my cls API patchset (removed async miniqp refactoring, changed block and chain lock types to mutex lock) and performed some benchmarks to determine exact mutex impact on cls API performance (both rule and chains API). 1. Parallel flower insertion rate (create 5 million flower filters on same chain/prio with 10 tc instances) - same performance: SPINLOCK $ time ls add_sw* | xargs -n 1 -P 100 sudo tc -force -b real 0m12.183s user 0m35.013s sys 1m24.947s MUTEX $ time ls add_sw* | xargs -n 1 -P 100 sudo tc -force -b real 0m12.246s user 0m35.417s sys 1m25.330s 2. Parallel flower insertion rate (create 100k flower filters, each on new instance of chain/tp, 10 tc instances) - mutex implementation is ~17% slower: SPINLOCK: $ time ls add_sw* | xargs -n 1 -P 100 sudo tc -force -b real 2m19.285s user 0m1.863s sys 5m41.157s MUTEX: multichain$ time ls add_sw* | xargs -n 1 -P 100 sudo tc -force -b real 2m43.495s user 0m1.664s sys 8m32.483s 3. Chains insertion rate with cls chain API (create 100k empty chains, 1 tc instance) - ~3% difference: SPINLOCK: $ time sudo tc -b add_chains real 0m46.822s user 0m0.239s sys 0m46.437s MUTEX: $ time sudo tc -b add_chains real 0m48.600s user 0m0.230s sys 0m48.227s Only case where performance is significantly impacted by mutex is second test. This happens because chain lookup is a linear search that is performed while holding highly contested block lock. Perf profile for mutex version confirms this: + 91.83% 0.00% tc [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe + 91.83% 0.00% tc [kernel.vmlinux] [k] do_syscall_64 + 91.80% 0.00% tc libc-2.25.so [.] __libc_sendmsg + 91.78% 0.00% tc [kernel.vmlinux] [k] __sys_sendmsg + 91.77% 0.00% tc [kernel.vmlinux] [k] ___sys_sendmsg + 91.77% 0.00% tc [kernel.vmlinux] [k] sock_sendmsg + 91.77% 0.00% tc [kernel.vmlinux] [k] netlink_sendmsg + 91.77% 0.01% tc [kernel.vmlinux] [k] netlink_unicast + 91.77% 0.00% tc [kernel.vmlinux] [k] netlink_rcv_skb + 91.71% 0.01% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 91.69% 0.03% tc [kernel.vmlinux] [k] tc_new_tfilter + 90.96% 26.45% tc [kernel.vmlinux] [k] __tcf_chain_get + 64.30% 0.01% tc [kernel.vmlinux] [k] __mutex_lock.isra.7 + 39.36% 39.18% tc [kernel.vmlinux] [k] osq_lock + 24.92% 24.81% tc [kernel.vmlinux] [k] mutex_spin_on_owner Based on these results we can conclude that use-cases with significant amount of chains on single block will be affected by using mutex in cls API. Regards, Vlad