Re: [PATCH 7/7] slub: initial bulk free implementation

Joonsoo Kim Tue, 16 Jun 2015 05:06:22 -0700

2015-06-16 17:57 GMT+09:00 Jesper Dangaard Brouer <[email protected]>:
> On Tue, 16 Jun 2015 10:21:10 +0200
> Jesper Dangaard Brouer <[email protected]> wrote:
>
>>
>> On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <[email protected]> 
>> wrote:
>>
>> > Is this really better than just calling __kmem_cache_free_bulk()?
>>
>> Yes, as can be seen by cover-letter, but my cover-letter does not seem
>> to have reached mm-list.
>>
>> Measurements for the entire patchset:
>>
>> Bulk - Fallback bulking           - fastpath-bulking
>>    1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  
>> 4.3%
>>    2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 
>> 39.1%
>>    3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 
>> 52.2%
>>    4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 
>> 57.8%
>>    8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 
>> 62.2%
>>   16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 
>> 61.4%
>>   30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 
>> 24.0%
>>   32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 
>> 23.3%
>>   34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 
>> 22.7%
>>   48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 
>> 20.0%
>>   64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 
>> 14.9%
>>  128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  
>> 9.2%
>>  158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  
>> 7.9%
>>  250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  
>> 3.8%
>>
>> I'll do a compare against the previous patch, and post the results.
>
> Compare against previous patch:
>
> Run:   previous-patch            - this patch
>   1 -   49 cycles(tsc) 12.378 ns -  43 cycles(tsc) 10.775 ns  improved 12.2%
>   2 -   37 cycles(tsc)  9.297 ns -  26 cycles(tsc)  6.652 ns  improved 29.7%
>   3 -   33 cycles(tsc)  8.348 ns -  21 cycles(tsc)  5.347 ns  improved 36.4%
>   4 -   31 cycles(tsc)  7.930 ns -  18 cycles(tsc)  4.669 ns  improved 41.9%
>   8 -   30 cycles(tsc)  7.693 ns -  17 cycles(tsc)  4.404 ns  improved 43.3%
>  16 -   32 cycles(tsc)  8.059 ns -  17 cycles(tsc)  4.493 ns  improved 46.9%


So, in your test, most of objects may come from one or two slabs and your
algorithm is well optimized for this case. But, is this workload normal case?
If most of objects comes from many different slabs, bulk free API does
enabling/disabling interrupt very much so I guess it work worse than
just calling __kmem_cache_free_bulk(). Could you test this case?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation

Reply via email to