fence: give some reasonable maximum signaling timeout

Lucas Stach Wed, 26 Nov 2025 08:11:16 -0800

Am Mittwoch, dem 26.11.2025 um 16:44 +0100 schrieb Philipp Stanner:
> On Wed, 2025-11-26 at 16:03 +0100, Christian König wrote:
> 
> > > 
[...]
> > > My hope would be that in the mid-term future we'd get firmware
> > > rings
> > > that can be preempted through a firmware call for all major
> > > hardware.
> > > Then a huge share of our problems would disappear.
> > 
> > At least on AMD HW pre-emption is actually horrible unreliable as
> > well.
> 
> Do you mean new GPUs with firmware scheduling, or what is "HW pre-
> emption"?
> 
> With firmware interfaces, my hope would be that you could simply tell
> 
> stop_running_ring(nr_of_ring)
> // time slice for someone else
> start_running_ring(nr_of_ring)
> 
> Thereby getting real scheduling and all that. And eliminating many
> other problems we know well from drm/sched.


It doesn't really matter if you have firmware scheduling or not for
preemption to be a hard problem on GPUs. CPUs have limited software
visible state that needs to be saved/restored on a context switch and
even there people start complaining now that they need to context
switch the AVX512 register set.

GPUs have megabytes of software visible state. Which needs to be
saved/restored on the context switch if you want fine grained
preemption with low preemption latency. There might be points in the
command execution where you can ignore most of that state, but reaching
those points can have basically unbounded latency. So either you can
reliably save/restore lots of state or you are limited to very coarse
grained preemption with all the usual issues of timeouts and DoS
vectors.
I'm not totally up to speed with the current state across all relevant
GPUs, but until recently NVidia was the only vendor to have real
reliable fine-grained preemption.

Regards,
Lucas

Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout

Reply via email to