On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote: > On 2/17/26 15:09, Alice Ryhl wrote: > > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <[email protected]> wrote: > > > > > > > > > > > >
[…] > > > > > > Thinking more about it you should probably enforce that there is > > > > > > only > > > > > > one signaling path for each fence signaling. > > > > > > > > > > I'm not really convinced by this. > > > > > > > > > > First, the timeout path must be a fence signalling path because the > > > > > reason you have a timeout in the first place is because the hw might > > > > > never signal the fence. So if the timeout path deadlocks on a > > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, > > > > > boom. > > > > > > > > Mhm, good point. On the other hand the timeout handling should probably > > > > be considered part of the normal signaling path. > > > > > > > > > Why would anyone want to allocate in a timeout path in the first place – > > > especially for jobqueue? > > > > > > Timeout -> close the associated ring. Done. > > > JobQueue will signal the done_fences with -ECANCELED. > > > > > > What would the driver want to allocate in its timeout path, i.e.: timeout > > > callback. > > > > Maybe you need an allocation to hold the struct delayed_work_struct > > field that you use to enqueue the timeout? > > And the workqueue were you schedule the delayed_work on must have the reclaim > bit set. > > Otherwise it can be that the workqueue finds all kthreads busy and tries to > start a new one, e.g. allocating task structure...... OK, maybe I'm lost, but what delayed_work? The jobqueue's delayed work item gets either created on JQ::new() or in jq.submit_job(). Why would anyone – that is: any driver – implement a delayed work in its timeout callback? That doesn't make sense. JQ notifies the driver from its delayed_work through timeout_callback(), and in that callback the driver closes the associated firmware ring. And it drops the JQ. So it is gone. A new JQ will get a new timeout work item. That's basically all the driver must ever do. Maybe some logging and stuff. With firmware scheduling it should really be that simple. And signalling / notifying userspace gets done by jobqueue. Right? > > You also potentially want device core dumps. Those usually use GFP_NOWAIT so > that they can't cycle back and wait for some fence. The down side is that > they can trivially fail under even light memory pressure. Simply logging into dmesg should do the trick, shouldn't it? P.
