> -----Original Message-----
> From: David Ahern [mailto:dsah...@gmail.com]
> Sent: Thursday, January 11, 2018 3:21 AM
> To: Chris Mi <chr...@mellanox.com>; netdev@vger.kernel.org
> Cc: gerlitz...@gmail.com; step...@networkplumber.org;
> marcelo.leit...@gmail.com; p...@nwl.cc
> Subject: Re: [patch iproute2 v8 1/2] lib/libnetlink: Add functions
> rtnl_talk_msg and rtnl_talk_iov
>
> On 1/9/18 8:27 PM, Chris Mi wrote:
> > rtnl_talk can only send a single message to kernel. Add two functions
> > rtnl_talk_msg and rtnl_talk_iov that can send multiple messages to kernel.
> > rtnl_talk_msg takes struct msghdr * as argument.
> > rtnl_talk_iov takes struct iovec * and iovlen as arguments.
> >
> > Signed-off-by: Chris Mi <chr...@mellanox.com>
> > ---
> > include/libnetlink.h | 6 ++++
> > lib/libnetlink.c | 82 ++++++++++++++++++++++++++++++++++++++++-
> -----------
> > 2 files changed, 70 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/libnetlink.h b/include/libnetlink.h index
> > a4d83b9e..e9a63dbc 100644
> > --- a/include/libnetlink.h
> > +++ b/include/libnetlink.h
> > @@ -96,6 +96,12 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
> > int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
> > struct nlmsghdr **answer)
> > __attribute__((warn_unused_result));
> > +int rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m,
> > + struct nlmsghdr **answer)
> > + __attribute__((warn_unused_result));
>
> As mentioned before rtnl_talk_msg is not needed; you only need to add
> rtnl_talk_iov. The attached fixup on top of your patch removes it and adjusts
> __rtnl_talk_iov. Please roll that change into your patch.
Done. I misunderstood you previous comment.
Thanks for your patch, David.
>
>
> While testing this I noticed 2 other oddities:
>
> $ perf trace -s tc -b tc.batch
> (stddev column removed to shorten line width)
>
> Summary of events:
>
> tc (780), 1857 events, 97.9%
>
> syscall calls total min avg max
> (msec) (msec) (msec) (msec)
> --------------- -------- --------- --------- --------- ---------
> recvmsg 530 6.532 0.008 0.012 0.218
> open 269 5.429 0.012 0.020 0.117
> sendmsg 4 3.518 0.092 0.879 1.647
>
>
>
> 1. recvmsg is called twice - once to peek at message size, allocate a buffer
> and then really receive the message. That is overkill for ACKs.
>
> 2. I am using a batch file with drop filters:
>
> filter add dev eth2 ingress protocol ip pref 273 flower dst_ip
> 192.168.253.0/16 action drop
>
> and for each command tc is trying to dlopen m_drop.so:
>
> open("/usr/lib/tc//m_drop.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No
> such file or directory)
>
>
> With a patch to use a stack buffer for ACKs, the above perf summary
> becomes:
>
> $ perf trace -s tc -b tc.batch
>
> Summary of events:
>
> tc (777), 1345 events, 97.1%
>
> syscall calls total min avg max
> (msec) (msec) (msec) (msec)
> --------------- -------- --------- --------- --------- ---------
> open 269 5.510 0.013 0.020 0.160
> recvmsg 274 3.758 0.009 0.014 0.396
> sendmsg 4 3.531 0.098 0.883 1.672
>
>
> Making the open errors now the dominate overhead affecting performance.
> If tc had some smarts that it already tried that file it would avoid the
> subsequent open calls. The end result is a significant speed up compared to
> the current tc:
>
> Summary of events:
>
> tc (785), 2333 events, 98.3%
>
> syscall calls total min avg max
> (msec) (msec) (msec) (msec)
> --------------- -------- --------- --------- --------- ---------
> sendmsg 256 9.832 0.029 0.038 0.181
> open 269 5.819 0.013 0.022 0.353
> recvmsg 530 5.592 0.009 0.011 0.285
>
>
> Can you look at a follow on patch (not part of this set) to cache status of
> dlopen attempts?
Sure, I will investigate this issue.