2015-06-16 17:57 GMT+09:00 Jesper Dangaard Brouer <bro...@redhat.com>: > On Tue, 16 Jun 2015 10:21:10 +0200 > Jesper Dangaard Brouer <bro...@redhat.com> wrote: > >> >> On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo....@lge.com> >> wrote: >> >> > Is this really better than just calling __kmem_cache_free_bulk()? >> >> Yes, as can be seen by cover-letter, but my cover-letter does not seem >> to have reached mm-list. >> >> Measurements for the entire patchset: >> >> Bulk - Fallback bulking - fastpath-bulking >> 1 - 47 cycles(tsc) 11.921 ns - 45 cycles(tsc) 11.461 ns improved >> 4.3% >> 2 - 46 cycles(tsc) 11.649 ns - 28 cycles(tsc) 7.023 ns improved >> 39.1% >> 3 - 46 cycles(tsc) 11.550 ns - 22 cycles(tsc) 5.671 ns improved >> 52.2% >> 4 - 45 cycles(tsc) 11.398 ns - 19 cycles(tsc) 4.967 ns improved >> 57.8% >> 8 - 45 cycles(tsc) 11.303 ns - 17 cycles(tsc) 4.298 ns improved >> 62.2% >> 16 - 44 cycles(tsc) 11.221 ns - 17 cycles(tsc) 4.423 ns improved >> 61.4% >> 30 - 75 cycles(tsc) 18.894 ns - 57 cycles(tsc) 14.497 ns improved >> 24.0% >> 32 - 73 cycles(tsc) 18.491 ns - 56 cycles(tsc) 14.227 ns improved >> 23.3% >> 34 - 75 cycles(tsc) 18.962 ns - 58 cycles(tsc) 14.638 ns improved >> 22.7% >> 48 - 80 cycles(tsc) 20.049 ns - 64 cycles(tsc) 16.247 ns improved >> 20.0% >> 64 - 87 cycles(tsc) 21.929 ns - 74 cycles(tsc) 18.598 ns improved >> 14.9% >> 128 - 98 cycles(tsc) 24.511 ns - 89 cycles(tsc) 22.295 ns improved >> 9.2% >> 158 - 101 cycles(tsc) 25.389 ns - 93 cycles(tsc) 23.390 ns improved >> 7.9% >> 250 - 104 cycles(tsc) 26.170 ns - 100 cycles(tsc) 25.112 ns improved >> 3.8% >> >> I'll do a compare against the previous patch, and post the results. > > Compare against previous patch: > > Run: previous-patch - this patch > 1 - 49 cycles(tsc) 12.378 ns - 43 cycles(tsc) 10.775 ns improved 12.2% > 2 - 37 cycles(tsc) 9.297 ns - 26 cycles(tsc) 6.652 ns improved 29.7% > 3 - 33 cycles(tsc) 8.348 ns - 21 cycles(tsc) 5.347 ns improved 36.4% > 4 - 31 cycles(tsc) 7.930 ns - 18 cycles(tsc) 4.669 ns improved 41.9% > 8 - 30 cycles(tsc) 7.693 ns - 17 cycles(tsc) 4.404 ns improved 43.3% > 16 - 32 cycles(tsc) 8.059 ns - 17 cycles(tsc) 4.493 ns improved 46.9%
So, in your test, most of objects may come from one or two slabs and your algorithm is well optimized for this case. But, is this workload normal case? If most of objects comes from many different slabs, bulk free API does enabling/disabling interrupt very much so I guess it work worse than just calling __kmem_cache_free_bulk(). Could you test this case? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html