On 06/10/2025 10:49, Christian König wrote:
On 03.10.25 15:58, Tvrtko Ursulin wrote:
GPUs typically benefit from contiguous memory via reduced TLB pressure and
improved caching performance, where the maximum size of contiguous block
which adds a performance benefit is related to hardware design.

TTM pool allocator by default tries (hard) to allocate up to the system
MAX_PAGE_ORDER blocks. This varies by the CPU platform and can also be
configured via Kconfig.

If that limit was set to be higher than the GPU can make an extra use of,
lets allow the individual drivers to let TTM know over which allocation
order can the pool allocator afford to make a little bit less effort with.

We implement this by disabling direct reclaim for those allocations, which
reduces the allocation latency and lowers the demands on the page
allocator, in cases where expending this effort is not critical for the
GPU in question.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Christian König <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
---
  drivers/gpu/drm/ttm/ttm_pool.c |  8 ++++++++
  include/drm/ttm/ttm_pool.h     | 10 ++++++++--
  2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 0323531d216a..c0bd92259394 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -135,6 +135,7 @@ static DECLARE_RWSEM(pool_shrink_rwsem);
  static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t 
gfp_flags,
                                        unsigned int order)
  {
+       const unsigned int beneficial_order = ttm_pool_beneficial_order(pool);
        unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
        struct ttm_pool_dma *dma;
        struct page *p;
@@ -148,6 +149,13 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
                gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
                        __GFP_THISNODE;
+ /*
+        * Do not add latency to the allocation path for allocations orders
+        * device tolds us do not bring them additional performance gains.
+        */
+       if (beneficial_order && order > beneficial_order)
+               gfp_flags &= ~__GFP_DIRECT_RECLAIM;
+
        if (!ttm_pool_uses_dma_alloc(pool)) {
                p = alloc_pages_node(pool->nid, gfp_flags, order);
                if (p)
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index d898186765f1..b2114e2d0695 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -59,8 +59,9 @@ struct ttm_pool_type {
        struct list_head pages;
  };
-#define TTM_POOL_USE_DMA_ALLOC BIT(0) /* Use coherent DMA allocations. */
-#define TTM_POOL_USE_DMA32     BIT(1) /* Use GFP_DMA32 allocations. */
+#define TTM_POOL_BENEFICIAL_ORDER(n)   ((n) & 0xff) /* Max order which caller 
can benefit from */

Looks good in general, but I'm not 100% convinced that we want to mix this 
value into the flags.

On the one hand it makes your live easier because you don't need to change all 
drivers using it, on the other hand changing all drivers using it would 
potentially be cleaner and document the value better.

I was not 100% convinced either but it looked a reasonable compromise.

My thinking was to not simply add an int after the existing two booleans but to try and clean it up at the same time. Once I replaced them with flags then the option were to either add a new int argument or add some flags like TTM_POOL_BENEFICIAL_SIZE_2M, TTM_POOL_BENEFICIAL_SIZE_64K, with the thinking there probably isn't a full range of page sizes. But then I thought why not just put the order in some bits. Advantages being it adds names to anonymous booleans and is extensible with no further churn.

But I don't know, I am happy to change it to something else if you are sure this isn't the way.

If we add a new int then it has to have some "stick with default" semantics. Like -1 or whatnot. With is also meh. I wouldn't do a zero because it feels conflated.

Regards,

Tvrtko

+#define TTM_POOL_USE_DMA_ALLOC                 BIT(8) /* Use coherent DMA 
allocations. */
+#define TTM_POOL_USE_DMA32             BIT(9) /* Use GFP_DMA32 allocations. */
/**
   * struct ttm_pool - Pool for all caching and orders
@@ -111,4 +112,9 @@ static inline bool ttm_pool_uses_dma32(struct ttm_pool 
*pool)
        return pool->flags & TTM_POOL_USE_DMA32;
  }
+static inline bool ttm_pool_beneficial_order(struct ttm_pool *pool)
+{
+       return pool->flags & 0xff;
+}
+
  #endif


Reply via email to