+Cc Lyude, Danilo On Thu, 2025-11-20 at 15:41 +0100, Christian König wrote: > Exceeding the recommended maximum timeout should be noted in logs and > crash dumps. > > Signed-off-by: Christian König <[email protected]> > --- > drivers/gpu/drm/scheduler/sched_main.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 1d4f1b822e7b..88e24e140def 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -1318,12 +1318,22 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, > const struct drm_sched_init_ > sched->ops = args->ops; > sched->credit_limit = args->credit_limit; > sched->name = args->name; > - sched->timeout = args->timeout; > sched->hang_limit = args->hang_limit; > sched->timeout_wq = args->timeout_wq ? args->timeout_wq : > system_percpu_wq; > sched->score = args->score ? args->score : &sched->_score; > sched->dev = args->dev; > > + sched->timeout = args->timeout; > + if (sched->timeout > DMA_FENCE_MAX_REASONABLE_TIMEOUT) { > + dev_warn(sched->dev, "Timeout %ld exceeds the maximum > recommended one!\n", > + sched->timeout); > + /* > + * Make sure that exceeding the recommendation is noted in > + * logs and crash dumps. > + */ > + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); > + } > +
I have to NACK this in the current form, it would cause a bunch of drivers to fire warnings, despite there being absolutely nothing wrong with them in the past https://elixir.bootlin.com/linux/v6.18-rc6/source/drivers/gpu/drm/nouveau/nouveau_sched.c#L412 https://elixir.bootlin.com/linux/v6.18-rc6/source/drivers/gpu/drm/lima/lima_sched.c#L519 I guess there are more. Nouveau's current timeout is an astonishing 10 seconds, and AFAIK there has never been a problem with that. If you want to declare this behavior invalid, you need to discuss that with the Nouveau maintainers first. It also didn't become clear to me why dma_fence is to define a timeout rule? I like to think that "must be signalled within reasonable time" is as precise as it gets. As demonstrated by the drivers, there is just no objectively correct definiton of "reasonable". BTW your series doesn't make clear to me why you only touch very few components: there are many more users of dma_fence than just vgem and sched. What about the others? P.
