> From: David Miller <da...@davemloft.net>
> Sent: Sunday, September 27, 2020 01:16
> To: sa...@kernel.org
> Cc: k...@kernel.org; netdev@vger.kernel.org; Yevgeny Kliteynik
> <klit...@nvidia.com>; Erez Shitrit <ere...@nvidia.com>; Mark Bloch
> <mbl...@nvidia.com>; Saeed Mahameed <sae...@nvidia.com>
> Subject: Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
> 
> External email: Use caution opening links or attachments
> 
> 
> From: sa...@kernel.org
> Date: Fri, 25 Sep 2020 12:37:55 -0700
> 
> > From: Yevgeny Kliteynik <klit...@nvidia.com>
> >
> > Add implementation of SW Steering variation of buddy allocator.
> >
> > The buddy system for ICM memory uses 2 main data structures:
> >   - Bitmap per order, that keeps the current state of allocated
> >     blocks for this order
> >   - Indicator for the number of available blocks per each order
> >
> > In addition, there is one more hierarchy of searching in the bitmaps
> > in order to accelerate the search of the next free block which done
> > via find-first function:
> > The buddy system spends lots of its time in searching the next free
> > space using function find_first_bit, which scans a big array of long
> > values in order to find the first bit. We added one more array of
> > longs, where each bit indicates a long value in the original array,
> > that way there is a need for much less searches for the next free area.
> >
> > For example, for the following bits array of 128 bits where all bits
> > are zero except for the last bit  :  0000........00001 the
> > corresponding bits-per-long array is:  0001
> >
> > The search will be done over the bits-per-long array first, and after
> > the first bit is found, we will use it as a start pointer in the
> > bigger array (bits array).
> >
> > Signed-off-by: Erez Shitrit <ere...@nvidia.com>
> > Signed-off-by: Yevgeny Kliteynik <klit...@nvidia.com>
> > Reviewed-by: Mark Bloch <mbl...@nvidia.com>
> > Signed-off-by: Saeed Mahameed <sae...@nvidia.com>
> 
> Instead of a bits-per-long array, it seems so much simpler and more cache
> friendly to maintain instead just a "lowest set bit" value.
> 
> In the initial state it is zero for all values, remember this is just a hint.
> 
> When you allocate, if num_free is non-zero of course, you start the bit scan
> from the "lowest set bit" value.  When the set bit is found, update the 
> "lowest
> set bit" cache to the set bit plus one (or zero if the new value exceeds the 
> bitmap
> size).

This will work when you allocate everything and freeing "in order".
In our vase the system is more dynamic - we allocate, and then some chunks
are freed all over, creating free spots in the bit array in a rather random 
manner.
By replacing the bits-per-long array with a single counter we loose this ability
to jump faster to the free spot.
It is still an improvement to start from a certain place and not from the 
beginning,
but bits-per-long array saves more time.
Also, in terms of memory, it's not a big hit - it's bit per 32 chunks (or byte 
per 256 chunks).

-- YK

> Then on free you update "lowest set bit" to the bit being set if it is 
> smaller than
> the current "lowest set bit" value for that order.
> 
> No double scanning of bitmap arrays, just a single bitmap search with a 
> variable
> start point.

Reply via email to