On Tue, Feb 17, 2026 at 03:28:06PM +0100, Philipp Stanner wrote: > On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote: > > On 2/17/26 15:09, Alice Ryhl wrote: > > > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <[email protected]> > > > wrote: > > > > > > > > > > > > > > > > […] > > > > > > > > Thinking more about it you should probably enforce that there is > > > > > > > only > > > > > > > one signaling path for each fence signaling. > > > > > > > > > > > > I'm not really convinced by this. > > > > > > > > > > > > First, the timeout path must be a fence signalling path because the > > > > > > reason you have a timeout in the first place is because the hw might > > > > > > never signal the fence. So if the timeout path deadlocks on a > > > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, > > > > > > boom. > > > > > > > > > > Mhm, good point. On the other hand the timeout handling should > > > > > probably be considered part of the normal signaling path. > > > > > > > > > > > > Why would anyone want to allocate in a timeout path in the first place > > > > – especially for jobqueue? > > > > > > > > Timeout -> close the associated ring. Done. > > > > JobQueue will signal the done_fences with -ECANCELED. > > > > > > > > What would the driver want to allocate in its timeout path, i.e.: > > > > timeout callback. > > > > > > Maybe you need an allocation to hold the struct delayed_work_struct > > > field that you use to enqueue the timeout? > > > > And the workqueue were you schedule the delayed_work on must have the > > reclaim bit set. > > > > Otherwise it can be that the workqueue finds all kthreads busy and tries to > > start a new one, e.g. allocating task structure...... > > OK, maybe I'm lost, but what delayed_work? > > The jobqueue's delayed work item gets either created on JQ::new() or in > jq.submit_job(). Why would anyone – that is: any driver – implement a > delayed work in its timeout callback? > > That doesn't make sense. > > JQ notifies the driver from its delayed_work through > timeout_callback(), and in that callback the driver closes the > associated firmware ring. > > And it drops the JQ. So it is gone. A new JQ will get a new timeout > work item. > > That's basically all the driver must ever do. Maybe some logging and > stuff. > > With firmware scheduling it should really be that simple. > > And signalling / notifying userspace gets done by jobqueue. > > Right?
What I'm getting at is that a driver author might attempt to implement their own timeout logic instead of using the job queue, and if they do, they might get it wrong in the way I described. You're correct that they shouldn't do this. But you asked how a driver author might get the timeout wrong, and doing it the wrong way is one such way they might do it in the wrong way. Alice
