Hi Julien,

> On 9 Jun 2022, at 09:30, Julien Grall <[email protected]> wrote:
> 
> From: Hongyan Xia <[email protected]>
> 
> The idea is to split the range into multiple aligned power-of-2 regions
> which only needs to call free_heap_pages() once each. We check the least
> significant set bit of the start address and use its bit index as the
> order of this increment. This makes sure that each increment is both
> power-of-2 and properly aligned, which can be safely passed to
> free_heap_pages(). Of course, the order also needs to be sanity checked
> against the upper bound and MAX_ORDER.
> 
> Testing on a nested environment on c5.metal with various amount
> of RAM. Time for end_boot_allocator() to complete:
>            Before         After
>    - 90GB: 1426 ms        166 ms
>    -  8GB:  124 ms         12 ms
>    -  4GB:   60 ms          6 ms


On a arm64 Neoverse N1 system with 32GB of Ram I have:
- 1180 ms before
- 63 ms after

and my internal tests are passing on arm64.

Great optimisation :-)

(I will do a full review of code the in a second step).

> 
> Signed-off-by: Hongyan Xia <[email protected]>
> Signed-off-by: Julien Grall <[email protected]>

Cheers
Bertrand


Reply via email to