On Tue, 14 Oct 2025 15:04:14 +0200
Christian König <[email protected]> wrote:

> On 14.10.25 14:44, Zhaoyang Huang wrote:
> > On Tue, Oct 14, 2025 at 7:59 PM Christian König
> > <[email protected]> wrote:  
> >>
> >> On 14.10.25 10:32, zhaoyang.huang wrote:  
> >>> From: Zhaoyang Huang <[email protected]>
> >>>
> >>> The size of once dma-buf allocation could be dozens MB or much more
> >>> which introduce a loop of allocating several thousands of order-0 pages.
> >>> Furthermore, the concurrent allocation could have dma-buf allocation enter
> >>> direct-reclaim during the loop. This commit would like to eliminate the
> >>> above two affections by introducing alloc_pages_bulk_list in dma-buf's
> >>> order-0 allocation. This patch is proved to be conditionally helpful
> >>> in 18MB allocation as decreasing the time from 24604us to 6555us and no
> >>> harm when bulk allocation can't be done(fallback to single page
> >>> allocation)  
> >>
> >> Well that sounds like an absolutely horrible idea.
> >>
> >> See the handling of allocating only from specific order is *exactly* there 
> >> to avoid the behavior of bulk allocation.
> >>
> >> What you seem to do with this patch here is to add on top of the behavior 
> >> to avoid allocating large chunks from the buddy the behavior to allocate 
> >> large chunks from the buddy because that is faster.  
> > emm, this patch doesn't change order-8 and order-4's allocation
> > behaviour but just to replace the loop of order-0 allocations into
> > once bulk allocation in the fallback way. What is your concern about
> > this?  
> 
> As far as I know the bulk allocation favors splitting large pages into 
> smaller ones instead of allocating smaller pages first. That's where the 
> performance benefit comes from.
> 
> But that is exactly what we try to avoid here by allocating only certain 
> order of pages.

This is a good question, actually. Yes, bulk alloc will split large
pages if there are insufficient pages on the pcp free list. But is
dma-buf indeed trying to avoid it, or is it merely using an inefficient
API? And does it need the extra speed? Even if it leads to increased
fragmentation?

Petr T

Reply via email to