On 6/9/26 15:19, Philipp Stanner wrote:
> +Cc Dave
> 
> On Mon, 2026-06-08 at 20:47 +0200, Christian König wrote:
>> On 6/8/26 20:39, Danilo Krummrich wrote:
>>> How did you get to this conclusion considering that you run into what I
>>> mentioned above as well and the fact that we seem to agree that the 
>>> performance
>>> concern is rather questionable?
>>
>> Quite simple, it's the cleaner approach.
>>
>> Calling callbacks with locks held is rather questionable even putting the 
>> performance issue aside.
> 
> 
> I'm right now going through all fence users to see whether we can
> implement my solution.
> 
> And look what I found:
> 
> static inline bool
> nouveau_cli_work_ready(struct dma_fence *fence)
> {
>       unsigned long flags;
>       bool ret = true;
> 
>       dma_fence_lock_irqsave(fence, flags);
>       if (!dma_fence_is_signaled_locked(fence))
>               ret = false;
>       dma_fence_unlock_irqrestore(fence, flags);
> 
>       if (ret == true)
>               dma_fence_put(fence);
>       return ret;
> }
> 
> 
> That looks weird, doesn't it?

No, that is pretty much expected.

This issue results because of the lock inversion/cleanup race between 
nouveau_fence_chan->lock and dropping the last reference.

That in turn is caused by the fact that enable_signaling is called with the 
fence lock held and delegates the signaling to the caller instead of doing it 
itself.

This in turn means that you can't do proper cleanup after the signaling is done 
by grabbing driver specific locks.

This is *exactly* the problem I'm trying to prevent here.

When the callbacks wouldn't be called with the fence lock held the Nouveau 
nouveau_fence_chan->lock and the fence lock would be completely independent.

This results in much better cleanup paths, fences which are independent of 
their issuers and in general much simpler handling for all dma_fence 
implementation backends because we don't need to worry all the time about lock 
inversions between the fence lock and internal driver locks.

So as far as I can see what you suggest here is exactly what has caused all the 
problems in the first place.

For the cleanup path in Rust you should be trivially able to use call_rcu() if 
the synchronized cleanup path would be causing issues (which I clearly agree 
on).

Regards,
Christian.

> 
> 
> We do some git-blame:
> 
> c8a5d5ea3ba6a18958f8d76430e4cd68eea33943
> 
> and we find that it's Dave who wrote that code, because
> 
> "
>     My analysis: two threads are running, one in the irq signalling the
>     fence, in dma_fence_signal_timestamp_locked, it has done the
>     DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the
>     callbacks.
>     
>     The second thread in nouveau_cli_work_ready, where it sees the fence is
>     signalled, so then puts the fence, cleanups the object and frees the
>     work item, which contains the callback.
>     
>     Thread one goes again and tries to call the callback and causes the
>     use-after-free.
> "
> 
> 
> So this is a race, caused by lockless speed optimization.
> 
> And it would further seem that there is one invention in computer
> science that can prevent such races:
> 
> Locks.
> 
> There is no cleaner, safer synchronization strategy in computer science
> than locking.
> 
> This race became possible because the lock does not guard the entirety
> of dma_fence_is_signaled().
> 
> Wouldn't you agree that this is a strong indicator for the great
> advantages that consequent and consistent lock-protection grants? IOW,
> by using locks more strictly in dma_fence, we can increase its
> robustness and reliability.
> 
> 
> P.

Reply via email to