On 6/10/26 16:25, Philipp Stanner wrote:
> On Tue, 2026-06-09 at 15:34 +0200, Christian König wrote:
>> On 6/9/26 15:19, Philipp Stanner wrote:
>>> And look what I found:
>>>
>>> static inline bool
>>> nouveau_cli_work_ready(struct dma_fence *fence)
>>> {
>>> unsigned long flags;
>>> bool ret = true;
>>>
>>> dma_fence_lock_irqsave(fence, flags);
>>> if (!dma_fence_is_signaled_locked(fence))
>>> ret = false;
>>> dma_fence_unlock_irqrestore(fence, flags);
>>>
>>> if (ret == true)
>>> dma_fence_put(fence);
>>> return ret;
>>> }
>>>
>>>
>>> That looks weird, doesn't it?
>>
>> No, that is pretty much expected.
>>
>> This issue results because of the lock inversion/cleanup race between
>> nouveau_fence_chan->lock and dropping the last reference.
>>
>> That in turn is caused by the fact that enable_signaling is called
>> with the fence lock held
>
>
> Are you referring to this comment from the documentation?
>
> * Since many implementations can call dma_fence_signal() even when before
> * @enable_signaling has been called there's a race window, where the
> * dma_fence_signal() might result in the final fence reference being
> * released and its memory freed. To avoid this, implementations of this
> * callback should grab their own reference using dma_fence_get(), to be
> * released when the fence is signalled (through e.g. the interrupt
> * handler).
> *
> * This callback is optional. If this callback is not present, then the
> * driver must always have signaling enabled.
> */
> bool (*enable_signaling)(struct dma_fence *fence);
Yes, that was an extremely bad idea which I have tried multiple times to fix.
>
>> and delegates the signaling to the caller instead of doing it itself.
>
> Who delegates what to whom?
>
> enable_signaling() is called indirectly by someone who adds a callback.
> If enable_signaling() is implemented, then the driver callback's only
> job is to activate some sort of interrupt or worker that will signal
> that fence at some point.
No, enable_signal also returns if enabling was successfully if it wasn't
successfully the dma_fence framework signals the fence.
>> This in turn means that you can't do proper cleanup after the
>> signaling is done by grabbing driver specific locks.
>
> Sure you can. If everything is properly synchronized.
>
> // driver
> dma_fence_signal(f);
> // all callbacks can't reach our driver anymore
That's irrelevant. The question is not if a callback can reach the backend
after signaling.
The question is if the backend can clean up after dma_fence_signal() completes.
> struct driver_fence = container_of(f);
> lock(driver_fence->special_lock);
> cleanup(f);
> unlock(…);
>
> Where is the deadlock?
>
> I continue to fail to see it. Show me an example of some code that
> would deadlock, please. Either fictive or from the kernel.
Interrupt driven signaling path:
spin_lock_irqsave(driver->fence_list_lock, flags);
list_for_each_entry_safe(...) {
if (fence->seqno < signaled_seqno)
break;
dma_fence_signal(fence);
list_entry_del(&fence->list);
dma_fence_put(fence);
}
spin_unlock_irqsave(driver->list_lock, flags);
Enable signaling path:
myfence_enable_signaling()
{
if (fence->seqno <= signaled_seqno)
return false;
talk_to_the_hw();
return true;
}
The problem here is that the enable_signaling path can't grab the
driver->fence_list_lock because that would be lock inversion with the fence
lock.
Implementations came up with tons of workarounds for this which only work more
or less correctly. The issues Nouveau had is just the tip of the iceberg here.
If we nuke the fact that enable_signaling() is called while holding the fence
lock all that complexity goes away. In other words it would then look like this:
myfence_enable_signaling()
{
spin_lock_irqsave(driver->fence_list_lock, flags);
if (fence->seqno > signaled_seqno) {
talk_to_the_hw();
} else {
dma_fence_signal(fence);
list_del(fence->list);
dma_fence_put(fence);
}
spin_unlock_irqrestore(driver->fence_list_lock, flags);
}
> Nouveau's enable_signaling() callback does not take locks. Its
> signaled() does not take locks.
Yeah because Nouveau reverted like most driver to use the same spinlock for the
driver lock and the fence lock.
But that approach is fundamentally broken, a) you can't cleanup from the
is_signaled path because that can be called with both the lock held and not
held and b) it doesn't allow the fences to be independent of the driver who
issued them.
When it would just be that we consistently call the is_signaled path with the
lock held I would immediately agree, but that breaks fence independence and
already caused so many issues that I clearly want to remove it.
Regards,
Christian.
>
> I went through all implementors of signaled() and found that only xe
> and nouveau invoke some function pointer that we want to investigate
> closer:
>
> User Way signaled() operates
> -----------------------------------------------
> amdgpu_userq_fence.c lockless
> etnaviv lockless
> i915 lockless
> nouveau lockless?
> radeon lockless
> vc4 lockless
> xe lockless?
>
>
> Who will be deadlocking and with which locks?
>
>
>> This is *exactly* the problem I'm trying to prevent here.
>>
>> When the callbacks wouldn't be called with the fence lock held the
>> Nouveau nouveau_fence_chan->lock and the fence lock would be
>> completely independent.
>
> You seem to circle around the idea that Nouveau's fence list is
> protected with the shared lock?
>
> That issue, if it exists, is solvable through the embedded lock.
>
> Anyways, there is no issue. Neither Nouveau nor other drivers will
> deadlock if we call ops->signaled() with the fence lock around.
>
>>
>> This results in much better cleanup paths, fences which are
>> independent of their issuers and in general much simpler handling for
>> all dma_fence implementation backends because we don't need to worry
>> all the time about lock inversions between the fence lock and
>> internal driver locks.
>>
>> So as far as I can see what you suggest here is exactly what has
>> caused all the problems in the first place.
>>
>> For the cleanup path in Rust you should be trivially able to use
>> call_rcu() if the synchronized cleanup path would be causing issues
>> (which I clearly agree on).
>
> You can *satisfy the current dma_fence API* with call_rcu(), but as
> Danilo states delaying work always has subtleties – plus I don't really
> want to have an unnecessary call_rcu() call for tens of thousands of
> fences per second.
>
> And I would prefer if the dma_fence implementation just wouldn't be
> racing. It is clearly desirable that a call to dma_fence_signal()
> decouples the callbacks. That's even how you have designed it, just
> with an unnecessary graceperiod thereafter.
>
> I also doubt that it's robust that dma_fence_is_signaled() can return
> true while the callbacks are not all executed.
>
> And all the RCU dancing is fragile, where people need to guard API
> calls with the RCU read lock instead of having the ability to rely on
> their reference / refcount. It would be far safer if dma_fence would
> guard against everything with the lock.
>
> Summarizing:
> * enable_signaling() uses the lock already.
> * set_deadline() users use the lock around the entire callback.
> * dma_fence_is_signaled_locked() demands that it must always be
> possible to invoke ops->signaled() with a lock, as Danilo pointed
> out.
> * The few implementors of ops->signaled(), see my list above, all do
> atomic operations without any locks at all.
> * The other callbacks are either deprecated or not relevant in this
> regard.
>
> So making dma_fence locking consistent would make all our problems
> disappear, could probably even solve the unload-problem if
> dma_fence_signal() becomes a hard, synchronous decoupling point, and it
> comes at no proven cost.
>
>
>
> P.