On 2/17/26 15:09, Alice Ryhl wrote:
> On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <[email protected]> wrote:
>>
>> On Tue, 2026-02-10 at 16:45 +0100, Christian König wrote:
>>> On 2/10/26 16:07, Alice Ryhl wrote:
>>>> On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
>>>>> On 2/10/26 14:49, Alice Ryhl wrote:
>>>>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>>>>>>> On Tue, 10 Feb 2026 13:15:31 +0000
>>>>>>> Alice Ryhl <[email protected]> wrote:
>>>>>>>
>>>>>>>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
>>>>>>>>> On Tue, 10 Feb 2026 10:15:04 +0000
>>>>>>>>> Alice Ryhl <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> impl MustBeSignalled<'_> {
>>>>>>>>>>     /// Drivers generally should not use this one.
>>>>>>>>>>     fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... 
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>     /// One way to ensure the fence has been signalled is to signal 
>>>>>>>>>> it.
>>>>>>>>>>     fn signal_fence(self) -> WillBeSignalled {
>>>>>>>>>>         self.fence.signal();
>>>>>>>>>>         self.i_promise_it_will_be_signalled()
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>>     /// Another way to ensure the fence will be signalled is to 
>>>>>>>>>> spawn a
>>>>>>>>>>     /// workqueue item that promises to signal it.
>>>>>>>>>>     fn transfer_to_wq(
>>>>>>>>>>         self,
>>>>>>>>>>         wq: &Workqueue,
>>>>>>>>>>         item: impl DmaFenceWorkItem,
>>>>>>>>>>     ) -> WillBeSignalled {
>>>>>>>>>>         // briefly obtain the lock class of the wq to indicate to
>>>>>>>>>>         // lockdep that the signalling path "blocks" on arbitrary 
>>>>>>>>>> jobs
>>>>>>>>>>         // from this wq completing
>>>>>>>>>>         bindings::lock_acquire(&wq->key);
>>>>>>>>>>         bindings::lock_release(&wq->key);
>>>>>>>>>>
>>>>>>>>>>         // enqueue the job
>>>>>>>>>>         wq.enqueue(item, wq);
>>>>>>>>>>
>>>>>>>>>>         // The signature of DmaFenceWorkItem::run() promises to 
>>>>>>>>>> arrange
>>>>>>>>>>         // for it to be signalled.
>>>>>>>>>>         self.i_promise_it_will_be_signalled()
>>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> I guess what's still missing is some sort of `transfer_to_hw()`
>>>>>>>>> function and way to flag the IRQ handler taking over the fence
>>>>>>>>> signaling token.
>>>>>>>>
>>>>>>>> Yes, transfer to hardware needs to be another piece of logic similar to
>>>>>>>> transfer to wq. And I imagine there are many ways such a transfer to
>>>>>>>> hardware could work.
>>>>>>>>
>>>>>>>> Unless you have a timeout on it, in which case the WillBeSignalled is
>>>>>>>> satisfied by the fact you have a timeout alone, and the signalling that
>>>>>>>> happens from the irq is just an opportunistic signal from outside the
>>>>>>>> dma fence signalling critical path.
>>>>>>>
>>>>>>> Yes and no. If it deadlocks in the completion WorkItem because of
>>>>>>> allocations (or any of the forbidden use cases), I think we want to
>>>>>>> catch that, because that's a sign fences are likely to end up with
>>>>>>> timeouts when they should have otherwise been signaled properly.
>>>>>>>
>>>>>>>> Well ... unless triggering timeouts can block on GFP_KERNEL
>>>>>>>> allocations...
>>>>>>>
>>>>>>> I mean, the timeout handler should also be considered a DMA-signalling
>>>>>>> path, and the same rules should apply to it.
>>>>>>
>>>>>> I guess that's fair. Even with a timeout you want both to be signalling
>>>>>> path.
>>>>>>
>>>>>> I guess more generally, if a fence is signalled by mechanism A or B,
>>>>>> whichever happens first, you have the choice between:
>>>>>
>>>>> That doesn't happen in practice.
>>>>>
>>>>> For each fence you only have one signaling path you need to guarantee
>>>>> forward progress for.
>>>>>
>>>>> All other signaling paths are just opportunistically optimizations
>>>>> which *can* signal the fence, but there is no guarantee that they
>>>>> will.
>>>>>
>>>>> We used to have some exceptions to that, especially around aborting
>>>>> submissions, but those turned out to be a really bad idea as well.
>>>>>
>>>>> Thinking more about it you should probably enforce that there is only
>>>>> one signaling path for each fence signaling.
>>>>
>>>> I'm not really convinced by this.
>>>>
>>>> First, the timeout path must be a fence signalling path because the
>>>> reason you have a timeout in the first place is because the hw might
>>>> never signal the fence. So if the timeout path deadlocks on a
>>>> kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
>>>
>>> Mhm, good point. On the other hand the timeout handling should probably be 
>>> considered part of the normal signaling path.
>>
>>
>> Why would anyone want to allocate in a timeout path in the first place – 
>> especially for jobqueue?
>>
>> Timeout -> close the associated ring. Done.
>> JobQueue will signal the done_fences with -ECANCELED.
>>
>> What would the driver want to allocate in its timeout path, i.e.: timeout 
>> callback.
> 
> Maybe you need an allocation to hold the struct delayed_work_struct
> field that you use to enqueue the timeout?

And the workqueue were you schedule the delayed_work on must have the reclaim 
bit set.

Otherwise it can be that the workqueue finds all kthreads busy and tries to 
start a new one, e.g. allocating task structure......

You also potentially want device core dumps. Those usually use GFP_NOWAIT so 
that they can't cycle back and wait for some fence. The down side is that they 
can trivially fail under even light memory pressure.

Regards,
Christian.

> 
> Alice

Reply via email to