Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Tvrtko Ursulin Fri, 17 Oct 2025 15:11:20 -0700


On 08/10/2025 15:39, Thomas Hellström wrote:

On Wed, 2025-10-08 at 16:02 +0200, Christian König wrote:

On 08.10.25 15:50, Tvrtko Ursulin wrote:


On 08/10/2025 13:35, Christian König wrote:

On 08.10.25 13:53, Tvrtko Ursulin wrote:

Disclaimer:
Please note that as this series includes a patch which touches
a good number of
drivers I will only copy everyone in the cover letter and the
respective patch.
Assumption is people are subscribed to dri-devel so can look at
the whole series
there. I know someone is bound to complain for both the case
when everyone is
copied on everything for getting too much email, and also for
this other case.
So please be flexible.

Description:

All drivers which use the TTM pool allocator end up requesting
large order
allocations when allocating large buffers. Those can be slow
due memory pressure
and so add latency to buffer creation. But there is often also
a size limit
above which contiguous blocks do not bring any performance
benefits. This series
allows drivers to say when it is okay for the TTM to try a bit
less hard.

We do this by allowing drivers to specify this cut off point
when creating the
TTM device and pools. Allocations above this size will skip
direct reclaim so
under memory pressure worst case latency will improve.
Background reclaim is
still kicked off and both before and after the memory pressure
all the TTM pool
buckets remain to be used as they are today.

This is especially interesting if someone has configured
MAX_PAGE_ORDER to
higher than the default. And even with the default, with amdgpu
for example,
the last patch in the series makes use of the new feature by
telling TTM that
above 2MiB we do not expect performance benefits. Which makes
TTM not try direct
reclaim for the top bucket (4MiB).

End result is TTM drivers become a tiny bit nicer mm citizens
and users benefit
from better worst case buffer creation latencies. As a side
benefit we get rid
of two instances of those often very unreadable mutliple
nameless booleans
function signatures.

If this sounds interesting and gets merge the invidual drivers
can follow up
with patches configuring their thresholds.

v2:
   * Christian suggested to pass in the new data by changing the
function signatures.

v3:
   * Moved ttm pool helpers into new ttm_pool_internal.h.
(Christian)


Patch #3 is Acked-by: Christian König <[email protected]>.

The rest is Reviewed-by: Christian König
<[email protected]>


Thank you!

So I think now I need acks to merge via drm-misc for all the
drivers which have their own trees. Which seems to be just xe.


I think you should ping the XE guys for their opinion, but since
there shouldn't be any functional change for them you can probably go
ahead and merge the patches to drm-misc-next when there is no reply
in time.


I will try to do a review tonight. One thing that comes up though, is
the change to ttm_device_init() where you add pool_flags. I had another
patch series a number of months ago that added a struct with flags
there instead to select the return value given when OOM. Now that we're
adding an argument, should we try to use a struct instead so that we
can use it for more that pool behavior?


I'll be able to find a pointer to that series later today.

Found it:https://lore.kernel.org/dri-devel/[email protected]/

Glad to see in that thread it isn't just me permanently slowed down by"false, false" and similar. :)

I considered using a struct too and I guess there wasn't too much of asway that I went with flags. I thought not to overcomplicate with the onstack struct which is mostly not needed for something so low level, andto stick with the old school C visual patterns.

Since you only needed a single boolean in your series I suppose youcould just follow up on my series if you find it acceptable. Or I can gowith yours, no problem either.


Regards,

Tvrtko

Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Reply via email to