Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Thomas Hellström Fri, 10 Oct 2025 07:11:37 -0700

On Thu, 2025-10-09 at 09:53 +0100, Tvrtko Ursulin wrote:
> 
> On 08/10/2025 15:39, Thomas Hellström wrote:
> > On Wed, 2025-10-08 at 16:02 +0200, Christian König wrote:
> > > On 08.10.25 15:50, Tvrtko Ursulin wrote:
> > > > 
> > > > On 08/10/2025 13:35, Christian König wrote:
> > > > > On 08.10.25 13:53, Tvrtko Ursulin wrote:
> > > > > > Disclaimer:
> > > > > > Please note that as this series includes a patch which
> > > > > > touches
> > > > > > a good number of
> > > > > > drivers I will only copy everyone in the cover letter and
> > > > > > the
> > > > > > respective patch.
> > > > > > Assumption is people are subscribed to dri-devel so can
> > > > > > look at
> > > > > > the whole series
> > > > > > there. I know someone is bound to complain for both the
> > > > > > case
> > > > > > when everyone is
> > > > > > copied on everything for getting too much email, and also
> > > > > > for
> > > > > > this other case.
> > > > > > So please be flexible.
> > > > > > 
> > > > > > Description:
> > > > > > 
> > > > > > All drivers which use the TTM pool allocator end up
> > > > > > requesting
> > > > > > large order
> > > > > > allocations when allocating large buffers. Those can be
> > > > > > slow
> > > > > > due memory pressure
> > > > > > and so add latency to buffer creation. But there is often
> > > > > > also
> > > > > > a size limit
> > > > > > above which contiguous blocks do not bring any performance
> > > > > > benefits. This series
> > > > > > allows drivers to say when it is okay for the TTM to try a
> > > > > > bit
> > > > > > less hard.
> > > > > > 
> > > > > > We do this by allowing drivers to specify this cut off
> > > > > > point
> > > > > > when creating the
> > > > > > TTM device and pools. Allocations above this size will skip
> > > > > > direct reclaim so
> > > > > > under memory pressure worst case latency will improve.
> > > > > > Background reclaim is
> > > > > > still kicked off and both before and after the memory
> > > > > > pressure
> > > > > > all the TTM pool
> > > > > > buckets remain to be used as they are today.
> > > > > > 
> > > > > > This is especially interesting if someone has configured
> > > > > > MAX_PAGE_ORDER to
> > > > > > higher than the default. And even with the default, with
> > > > > > amdgpu
> > > > > > for example,
> > > > > > the last patch in the series makes use of the new feature
> > > > > > by
> > > > > > telling TTM that
> > > > > > above 2MiB we do not expect performance benefits. Which
> > > > > > makes
> > > > > > TTM not try direct
> > > > > > reclaim for the top bucket (4MiB).
> > > > > > 
> > > > > > End result is TTM drivers become a tiny bit nicer mm
> > > > > > citizens
> > > > > > and users benefit
> > > > > > from better worst case buffer creation latencies. As a side
> > > > > > benefit we get rid
> > > > > > of two instances of those often very unreadable mutliple
> > > > > > nameless booleans
> > > > > > function signatures.
> > > > > > 
> > > > > > If this sounds interesting and gets merge the invidual
> > > > > > drivers
> > > > > > can follow up
> > > > > > with patches configuring their thresholds.
> > > > > > 
> > > > > > v2:
> > > > > >    * Christian suggested to pass in the new data by
> > > > > > changing the
> > > > > > function signatures.
> > > > > > 
> > > > > > v3:
> > > > > >    * Moved ttm pool helpers into new ttm_pool_internal.h.
> > > > > > (Christian)
> > > > > 
> > > > > Patch #3 is Acked-by: Christian König
> > > > > <christian.koe...@amd.com>.
> > > > > 
> > > > > The rest is Reviewed-by: Christian König
> > > > > <christian.koe...@amd.com>
> > > > 
> > > > Thank you!
> > > > 
> > > > So I think now I need acks to merge via drm-misc for all the
> > > > drivers which have their own trees. Which seems to be just xe.
> > > 
> > > I think you should ping the XE guys for their opinion, but since
> > > there shouldn't be any functional change for them you can
> > > probably go
> > > ahead and merge the patches to drm-misc-next when there is no
> > > reply
> > > in time.
> > 
> > I will try to do a review tonight. One thing that comes up though,
> > is
> > the change to ttm_device_init() where you add pool_flags. I had
> > another
> > patch series a number of months ago that added a struct with flags
> > there instead to select the return value given when OOM. Now that
> > we're
> > adding an argument, should we try to use a struct instead so that
> > we
> > can use it for more that pool behavior?
> > 
> > 
> > I'll be able to find a pointer to that series later today.
> 
> Found it: 
> https://lore.kernel.org/dri-devel/20241002122422.287276-1-thomas.hellst...@linux.intel.com/
> 
> Glad to see in that thread it isn't just me permanently slowed down
> by 
> "false, false" and similar. :)
> 
> I considered using a struct too and I guess there wasn't too much of
> a 
> sway that I went with flags. I thought not to overcomplicate with the
> on 
> stack struct which is mostly not needed for something so low level,
> and 
> to stick with the old school C visual patterns.
> 
> Since you only needed a single boolean in your series I suppose you 
> could just follow up on my series if you find it acceptable. Or I can
> go 
> with yours, no problem either.


It seems yours has the most momentum ATM. I can follow up on yours. It
would be great if we could perhaps change the naming of "pool_flags" to
something more generic.

Thanks,
Thomas


> 
> Regards,
> 
> Tvrtko
>

Re: [PATCH v3 0/5] Improving the worst case TTM large allocation latency

Reply via email to