On 2/17/26 15:09, Alice Ryhl wrote: > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <[email protected]> wrote: >> >> On Tue, 2026-02-10 at 16:45 +0100, Christian König wrote: >>> On 2/10/26 16:07, Alice Ryhl wrote: >>>> On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote: >>>>> On 2/10/26 14:49, Alice Ryhl wrote: >>>>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote: >>>>>>> On Tue, 10 Feb 2026 13:15:31 +0000 >>>>>>> Alice Ryhl <[email protected]> wrote: >>>>>>> >>>>>>>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote: >>>>>>>>> On Tue, 10 Feb 2026 10:15:04 +0000 >>>>>>>>> Alice Ryhl <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> impl MustBeSignalled<'_> { >>>>>>>>>> /// Drivers generally should not use this one. >>>>>>>>>> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> /// One way to ensure the fence has been signalled is to signal >>>>>>>>>> it. >>>>>>>>>> fn signal_fence(self) -> WillBeSignalled { >>>>>>>>>> self.fence.signal(); >>>>>>>>>> self.i_promise_it_will_be_signalled() >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> /// Another way to ensure the fence will be signalled is to >>>>>>>>>> spawn a >>>>>>>>>> /// workqueue item that promises to signal it. >>>>>>>>>> fn transfer_to_wq( >>>>>>>>>> self, >>>>>>>>>> wq: &Workqueue, >>>>>>>>>> item: impl DmaFenceWorkItem, >>>>>>>>>> ) -> WillBeSignalled { >>>>>>>>>> // briefly obtain the lock class of the wq to indicate to >>>>>>>>>> // lockdep that the signalling path "blocks" on arbitrary >>>>>>>>>> jobs >>>>>>>>>> // from this wq completing >>>>>>>>>> bindings::lock_acquire(&wq->key); >>>>>>>>>> bindings::lock_release(&wq->key); >>>>>>>>>> >>>>>>>>>> // enqueue the job >>>>>>>>>> wq.enqueue(item, wq); >>>>>>>>>> >>>>>>>>>> // The signature of DmaFenceWorkItem::run() promises to >>>>>>>>>> arrange >>>>>>>>>> // for it to be signalled. >>>>>>>>>> self.i_promise_it_will_be_signalled() >>>>>>>>>> } >>>>>>>>> >>>>>>>>> I guess what's still missing is some sort of `transfer_to_hw()` >>>>>>>>> function and way to flag the IRQ handler taking over the fence >>>>>>>>> signaling token. >>>>>>>> >>>>>>>> Yes, transfer to hardware needs to be another piece of logic similar to >>>>>>>> transfer to wq. And I imagine there are many ways such a transfer to >>>>>>>> hardware could work. >>>>>>>> >>>>>>>> Unless you have a timeout on it, in which case the WillBeSignalled is >>>>>>>> satisfied by the fact you have a timeout alone, and the signalling that >>>>>>>> happens from the irq is just an opportunistic signal from outside the >>>>>>>> dma fence signalling critical path. >>>>>>> >>>>>>> Yes and no. If it deadlocks in the completion WorkItem because of >>>>>>> allocations (or any of the forbidden use cases), I think we want to >>>>>>> catch that, because that's a sign fences are likely to end up with >>>>>>> timeouts when they should have otherwise been signaled properly. >>>>>>> >>>>>>>> Well ... unless triggering timeouts can block on GFP_KERNEL >>>>>>>> allocations... >>>>>>> >>>>>>> I mean, the timeout handler should also be considered a DMA-signalling >>>>>>> path, and the same rules should apply to it. >>>>>> >>>>>> I guess that's fair. Even with a timeout you want both to be signalling >>>>>> path. >>>>>> >>>>>> I guess more generally, if a fence is signalled by mechanism A or B, >>>>>> whichever happens first, you have the choice between: >>>>> >>>>> That doesn't happen in practice. >>>>> >>>>> For each fence you only have one signaling path you need to guarantee >>>>> forward progress for. >>>>> >>>>> All other signaling paths are just opportunistically optimizations >>>>> which *can* signal the fence, but there is no guarantee that they >>>>> will. >>>>> >>>>> We used to have some exceptions to that, especially around aborting >>>>> submissions, but those turned out to be a really bad idea as well. >>>>> >>>>> Thinking more about it you should probably enforce that there is only >>>>> one signaling path for each fence signaling. >>>> >>>> I'm not really convinced by this. >>>> >>>> First, the timeout path must be a fence signalling path because the >>>> reason you have a timeout in the first place is because the hw might >>>> never signal the fence. So if the timeout path deadlocks on a >>>> kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom. >>> >>> Mhm, good point. On the other hand the timeout handling should probably be >>> considered part of the normal signaling path. >> >> >> Why would anyone want to allocate in a timeout path in the first place – >> especially for jobqueue? >> >> Timeout -> close the associated ring. Done. >> JobQueue will signal the done_fences with -ECANCELED. >> >> What would the driver want to allocate in its timeout path, i.e.: timeout >> callback. > > Maybe you need an allocation to hold the struct delayed_work_struct > field that you use to enqueue the timeout?
And the workqueue were you schedule the delayed_work on must have the reclaim bit set. Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure...... You also potentially want device core dumps. Those usually use GFP_NOWAIT so that they can't cycle back and wait for some fence. The down side is that they can trivially fail under even light memory pressure. Regards, Christian. > > Alice
