On 09/06/2026 14:19, Philipp Stanner wrote:
+Cc Dave
On Mon, 2026-06-08 at 20:47 +0200, Christian König wrote:
On 6/8/26 20:39, Danilo Krummrich wrote:
How did you get to this conclusion considering that you run into what I
mentioned above as well and the fact that we seem to agree that the performance
concern is rather questionable?
Quite simple, it's the cleaner approach.
Calling callbacks with locks held is rather questionable even putting the
performance issue aside.
I'm right now going through all fence users to see whether we can
implement my solution.
And look what I found:
static inline bool
nouveau_cli_work_ready(struct dma_fence *fence)
{
unsigned long flags;
bool ret = true;
dma_fence_lock_irqsave(fence, flags);
if (!dma_fence_is_signaled_locked(fence))
ret = false;
dma_fence_unlock_irqrestore(fence, flags);
if (ret == true)
dma_fence_put(fence);
return ret;
}
That looks weird, doesn't it?
We do some git-blame:
c8a5d5ea3ba6a18958f8d76430e4cd68eea33943
and we find that it's Dave who wrote that code, because
"
My analysis: two threads are running, one in the irq signalling the
fence, in dma_fence_signal_timestamp_locked, it has done the
DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the
callbacks.
The second thread in nouveau_cli_work_ready, where it sees the fence is
signalled, so then puts the fence, cleanups the object and frees the
work item, which contains the callback.
Thread one goes again and tries to call the callback and causes the
use-after-free.
"
So this is a race, caused by lockless speed optimization.
And it would further seem that there is one invention in computer
science that can prevent such races:
Locks.
There is no cleaner, safer synchronization strategy in computer science
than locking.
This race became possible because the lock does not guard the entirety
of dma_fence_is_signaled().
One could also argue the problem was caused by sharing the fence
reference without holding it, anyway, I wanted to ask something else.
What happened to the idea to remove opportunistic signalling from
dma_fence_is_signaled?
Regards,
Tvrtko
Wouldn't you agree that this is a strong indicator for the great
advantages that consequent and consistent lock-protection grants? IOW,
by using locks more strictly in dma_fence, we can increase its
robustness and reliability.
P.