On Tue, Feb 17, 2026 at 03:28:06PM +0100, Philipp Stanner wrote:
> On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote:
> > On 2/17/26 15:09, Alice Ryhl wrote:
> > > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <[email protected]> 
> > > wrote:
> > > > > > > 
> > > > > > > 
> 
> […]
> 
> > > > > > > Thinking more about it you should probably enforce that there is 
> > > > > > > only
> > > > > > > one signaling path for each fence signaling.
> > > > > > 
> > > > > > I'm not really convinced by this.
> > > > > > 
> > > > > > First, the timeout path must be a fence signalling path because the
> > > > > > reason you have a timeout in the first place is because the hw might
> > > > > > never signal the fence. So if the timeout path deadlocks on a
> > > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, 
> > > > > > boom.
> > > > > 
> > > > > Mhm, good point. On the other hand the timeout handling should 
> > > > > probably be considered part of the normal signaling path.
> > > > 
> > > > 
> > > > Why would anyone want to allocate in a timeout path in the first place 
> > > > – especially for jobqueue?
> > > > 
> > > > Timeout -> close the associated ring. Done.
> > > > JobQueue will signal the done_fences with -ECANCELED.
> > > > 
> > > > What would the driver want to allocate in its timeout path, i.e.: 
> > > > timeout callback.
> > > 
> > > Maybe you need an allocation to hold the struct delayed_work_struct
> > > field that you use to enqueue the timeout?
> > 
> > And the workqueue were you schedule the delayed_work on must have the 
> > reclaim bit set.
> > 
> > Otherwise it can be that the workqueue finds all kthreads busy and tries to 
> > start a new one, e.g. allocating task structure......
> 
> OK, maybe I'm lost, but what delayed_work?
> 
> The jobqueue's delayed work item gets either created on JQ::new() or in
> jq.submit_job(). Why would anyone – that is: any driver – implement a
> delayed work in its timeout callback?
> 
> That doesn't make sense.
> 
> JQ notifies the driver from its delayed_work through
> timeout_callback(), and in that callback the driver closes the
> associated firmware ring.
> 
> And it drops the JQ. So it is gone. A new JQ will get a new timeout
> work item.
> 
> That's basically all the driver must ever do. Maybe some logging and
> stuff.
> 
> With firmware scheduling it should really be that simple.
> 
> And signalling / notifying userspace gets done by jobqueue.
> 
> Right?

What I'm getting at is that a driver author might attempt to implement
their own timeout logic instead of using the job queue, and if they do,
they might get it wrong in the way I described.

You're correct that they shouldn't do this. But you asked how a driver
author might get the timeout wrong, and doing it the wrong way is one
such way they might do it in the wrong way.

Alice

Reply via email to