libnetlink: update rtnl_talk to support malloc buff at run time

Stephen Hemminger Thu, 12 Oct 2017 09:08:21 -0700

On Wed, 11 Oct 2017 13:10:07 +0200
Phil Sutter <[email protected]> wrote:

> On Tue, Oct 10, 2017 at 09:47:43AM -0700, Stephen Hemminger wrote:
> > On Tue, 10 Oct 2017 08:41:17 +0200
> > Michal Kubecek <[email protected]> wrote:
> > 
> > > On Mon, Oct 09, 2017 at 10:25:25PM +0200, Phil Sutter wrote:
> > > > Hi Stephen,
> > > > 
> > > > On Mon, Oct 02, 2017 at 10:37:08AM -0700, Stephen Hemminger wrote:  
> > > > > On Thu, 28 Sep 2017 21:33:46 +0800
> > > > > Hangbin Liu <[email protected]> wrote:
> > > > >   
> > > > > > From: Hangbin Liu <[email protected]>
> > > > > > 
> > > > > > This is an update for 460c03f3f3cc ("iplink: double the buffer size 
> > > > > > also in
> > > > > > iplink_get()"). After update, we will not need to double the buffer 
> > > > > > size
> > > > > > every time when VFs number increased.
> > > > > > 
> > > > > > With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply 
> > > > > > remove the
> > > > > > length parameter.
> > > > > > 
> > > > > > With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new 
> > > > > > variable
> > > > > > answer to avoid overwrite data in nlh, because it may has more info 
> > > > > > after
> > > > > > nlh. also this will avoid nlh buffer not enough issue.
> > > > > > 
> > > > > > We need to free answer after using.
> > > > > > 
> > > > > > Signed-off-by: Hangbin Liu <[email protected]>
> > > > > > Signed-off-by: Phil Sutter <[email protected]>
> > > > > > ---  
> > > > > 
> > > > > Most of the uses of rtnl_talk() don't need to this peek and dynamic 
> > > > > sizing.
> > > > > Can only those places that need that be targeted?  
> > > > 
> > > > We could probably do that, by having a buffer on stack in __rtnl_talk()
> > > > which will be used instead of the allocated one if 'answer' is NULL. Or
> > > > maybe even introduce a dedicated API call for the dynamically allocated
> > > > receive buffer. But I really doubt that's feasible: AFAICT, that stack
> > > > buffer still needs to be reasonably sized since the reply might be
> > > > larger than the request (reusing the request buffer would be the most
> > > > simple way to tackle this), also there is support for extack which may
> > > > bloat the response to arbitrary size. Hangbin has shown in his benchmark
> > > > that the overhead of the second syscall is negligible, so why care about
> > > > that and increase code complexity even further?
> > > > 
> > > > Not saying it's not possible, but I just doubt it's worth the effort.  
> > > 
> > > Agreed. Current code is based on the assumption that we can estimate the
> > > maximum reply length in advance and the reason for this series is that
> > > this assumption turned out to be wrong. I'm afraid that if we replace
> > > it by an assumption that we can estimate the maximum reply length for
> > > most requests with only few exceptions, it's only matter of time for us
> > > to be proven wrong again.
> > > 
> > > Michal Kubecek
> > > 
> > 
> > For query responses, yes the response may be large. But for the common 
> > cases of
> > add address or add route, the response should just be ack or error.
> 
> And with extack, error is comprised of the original request plus an
> arbitrarily sized error message, so we can't just reuse the request
> buffer and are back to "guessing" the right length again.
> 
> To get an idea of what we're talking about, I wrote a simple benchmark
> which adds 256 * 254 (= 65024) addresses to an interface, then removes
> them again one by one and measured the time that takes for binaries with
> and without Hangbin's patches:
> 
> OP    Vanilla         Hangbin         Delta
> --------------------------------------------------------
> add   real 2m16.244s  real 2m27.964s  +11.72s (108.6%)
>       user 0m15.241s  user 0m17.295s  +2.054s (113.5%)
>       sys  1m40.229s  sys  1m48.239s  +8.01s  (108.0%)
> 
> remove        real 1m44.950s  real 1m47.044s  +2.094s (102.0%)
>       user 0m13.899s  user 0m14.723s  +0.824s (105.9%)
>       sys  1m30.798s  sys  1m31.938s  +1.140s (101.3%)
> 
> So the overhead of the second syscall and dynamic memory allocation is
> less than 10% overall. Given the short time a single call to 'ip'
> typically takes, I don't think the difference is noticeable even in
> highly performance critical applications.
> 
> Cheers, Phil


For a better benchmark, I generated 4 Million routes
then did: 
        # ip ---batch routes.txt


OP      Vanilla         Hangbin         Delta
-----------------------------------------------------
add     real 1:25.840   1:33.677        +9.13%
        user   10.690      6.078        -56.85%
        sys  1:00.920   1:13.109        +20.00% 

remove  real 2:29.881   2:25.872        -2.67%
        user   12.862      7.942        -38.25%
        sys    44.127     44.633        +1.15%


So the answer is addition is slower but deletion appears faster?
If I rerun the Vanilla test, get about the same times.

The slowdown won't impact me, but what about large scale users
like Cumulus.

Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time

Reply via email to