Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Tvrtko Ursulin Sat, 18 Oct 2025 00:05:27 -0700


On 08/10/2025 15:34, Matthew Auld wrote:

On 08/10/2025 14:50, Tvrtko Ursulin wrote:
On 08/10/2025 13:35, Christian König wrote:
On 08.10.25 13:53, Tvrtko Ursulin wrote:
Disclaimer:
Please note that as this series includes a patch which touches agood number ofdrivers I will only copy everyone in the cover letter and therespective patch.Assumption is people are subscribed to dri-devel so can look at thewhole seriesthere. I know someone is bound to complain for both the case wheneveryone iscopied on everything for getting too much email, and also for thisother case.
So please be flexible.

Description:
All drivers which use the TTM pool allocator end up requesting largeorderallocations when allocating large buffers. Those can be slow duememory pressureand so add latency to buffer creation. But there is often also asize limitabove which contiguous blocks do not bring any performance benefits.This seriesallows drivers to say when it is okay for the TTM to try a bit lesshard.
We do this by allowing drivers to specify this cut off point whencreating theTTM device and pools. Allocations above this size will skip directreclaim sounder memory pressure worst case latency will improve. Backgroundreclaim isstill kicked off and both before and after the memory pressure allthe TTM pool
buckets remain to be used as they are today.
This is especially interesting if someone has configuredMAX_PAGE_ORDER tohigher than the default. And even with the default, with amdgpu forexample,the last patch in the series makes use of the new feature by tellingTTM thatabove 2MiB we do not expect performance benefits. Which makes TTMnot try direct
reclaim for the top bucket (4MiB).
End result is TTM drivers become a tiny bit nicer mm citizens andusers benefitfrom better worst case buffer creation latencies. As a side benefitwe get ridof two instances of those often very unreadable mutliple namelessbooleans
function signatures.
If this sounds interesting and gets merge the invidual drivers canfollow up
with patches configuring their thresholds.

v2:
* Christian suggested to pass in the new data by changing thefunction signatures.
v3:
  * Moved ttm pool helpers into new ttm_pool_internal.h. (Christian)
Patch #3 is Acked-by: Christian König <[email protected]>.

The rest is Reviewed-by: Christian König <[email protected]>
Thank you!
So I think now I need acks to merge via drm-misc for all the driverswhich have their own trees. Which seems to be just xe.
Also interesting for other drivers is that when this lands folks canstart passing in their "max size which leads to performance gains" viaTTM_POOL_BENEFICIAL_ORDER and get the worst case allocation latencyimprovements.
I am thinking xe also maxes out at 2MiB pages, for others I don't know.
Yeah, next level up from 2M GTT page is still 1G GTT page. I think weespecially need 64K/2M system memory pages on igpu to get some perf backwhen enabling iommu on some platforms IIRC. Not aware of really needing> 2M so sounds like we might also benefit by maxing out at 2M, if itreduces allocation latency in some cases.

To clarify a bit, the current semantics of the series is not to max outat the order specified by the driver as the "max beneficial", but tojust skip doing direct reclaim above it. Otherwise TTM pool allocatorkeeps the current behaviour of trying from MAX_PAGE_ORDER and down. Asthat is 4MB by default on x86 (and configurable on some platforms viaKconfig), idea is not to pay the latency cost of direct reclaim forsizes which bring no additional performance benefit to the GPU. Andsince it will kick off background reclaim, both past and futureallocation can still get the larger order blocks.

If in the future we want to actually max out at the driver specifiedsize that could be discussed. But for now I wanted to have the smallestchange in behaviour possible.


Regards,

Tvrtko

v1 thread:

https://lore.kernel.org/dri-devel/20250919131127.90932-1-[email protected]/


Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Sui Jingfeng <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Zack Rusin <[email protected]>

Tvrtko Ursulin (5):
   drm/ttm: Add getter for some pool properties
   drm/ttm: Replace multiple booleans with flags in pool init
   drm/ttm: Replace multiple booleans with flags in device init
   drm/ttm: Allow drivers to specify maximum beneficial TTM pool size
   drm/amdgpu: Configure max beneficial TTM pool allocation order

  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  7 +--
  drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
  drivers/gpu/drm/i915/intel_region_ttm.c       |  2 +-
  drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
  drivers/gpu/drm/nouveau/nouveau_ttm.c         |  4 +-
  drivers/gpu/drm/qxl/qxl_ttm.c                 |  2 +-
  drivers/gpu/drm/radeon/radeon_ttm.c           |  4 +-
  drivers/gpu/drm/ttm/tests/ttm_bo_test.c       | 16 +++----
  .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |  2 +-
  drivers/gpu/drm/ttm/tests/ttm_device_test.c   | 31 +++++--------
  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 22 ++++-----
  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h |  7 +--
  drivers/gpu/drm/ttm/tests/ttm_pool_test.c     | 23 +++++-----
  drivers/gpu/drm/ttm/ttm_device.c              |  7 ++-

drivers/gpu/drm/ttm/ttm_pool.c | 45 +++++++++++--------

  drivers/gpu/drm/ttm/ttm_pool_internal.h       | 24 ++++++++++
  drivers/gpu/drm/ttm/ttm_tt.c                  | 10 +++--
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |  4 +-
  drivers/gpu/drm/xe/xe_device.c                |  2 +-
  include/drm/ttm/ttm_device.h                  |  2 +-
  include/drm/ttm/ttm_pool.h                    | 13 +++---
  21 files changed, 125 insertions(+), 106 deletions(-)
  create mode 100644 drivers/gpu/drm/ttm/ttm_pool_internal.h

Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Reply via email to