Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking

Christian König Thu, 11 Jun 2026 02:14:54 -0700

On 6/11/26 10:35, Philipp Stanner wrote:
> On Wed, 2026-06-10 at 17:15 +0200, Christian König wrote:
>> On 6/10/26 16:25, Philipp Stanner wrote:
>>>
>>> Are you referring to this comment from the documentation?
>>>
>>>  * Since many implementations can call dma_fence_signal() even when before
>>>  * @enable_signaling has been called there's a race window, where the
>>>  * dma_fence_signal() might result in the final fence reference being
>>>  * released and its memory freed. To avoid this, implementations of this
>>>  * callback should grab their own reference using dma_fence_get(), to be
>>>  * released when the fence is signalled (through e.g. the interrupt
>>>  * handler).
>>>  *
>>>  * This callback is optional. If this callback is not present, then the
>>>  * driver must always have signaling enabled.
>>>  */
>>>  bool (*enable_signaling)(struct dma_fence *fence);
>>
>> Yes, that was an extremely bad idea which I have tried multiple times to fix.
> 
> What's the reason why enable_signaling() is nowadays called with lock-
> protection?


That was some decision originally made a long long time ago because the initial 
thought was that fence need to signal in order, but that concept was also 
abandoned a long long time ago.

Fixing this is on my TODO list ever since we figured out that this was a bad 
idea, see the latest patch set here as well: 
https://www.spinics.net/lists/dri-devel/msg461253.html

>>> Sure you can. If everything is properly synchronized.
>>>
>>> // driver
>>> dma_fence_signal(f);
>>> // all callbacks can't reach our driver anymore
>>
>> That's irrelevant. The question is not if a callback can reach the backend 
>> after signaling.
> 
> Irrelevant for your lock-inversion maybe. It's very relevant for life
> time and module unload. Proof: you and Tvrtko made dma_fence_signal()
> the decoupling point, with RCU protection.
> 
> And as I keep saying, that synchronisation point would make the entire
> framework simpler and more robust if we were making it consistent.

Exactly that's what I strongly disagree on.

The fence lock and the synchronization point which allows the driver to know 
when nobody else is in a callback any more are two very different things we 
should not be mixed up together.

The fence lock protects the signaled state of the fence, so that the fence 
functions can add callbacks, test signaling etc... while the fence can't 
signal. It is protecting internal state of the fence and so should be internal 
to the fence, external code should not touch it if possible.

The synchronization point for the driver is a service the dma_fence 
implementation offers to let drivers know when there is no more caller of their 
function. And usually this is implemented using RCU/SRCU.

Tvrko and I now choose RCU instead of SRCU because there didn't seem to be a 
need to sleep in the fence callbacks.

But both RCU and SRCU offer not only synchronize_rcu()/_srcu() but also 
call_rcu()/_srcu() which allows drivers to delegate the cleanup to a point 
where it is save to do so.

So while I see the problem I actually don't understand why you insist of 
solving it with the fence lock? That just brings us back to the bad design we 
had before where internal fence state is abused to protect something else.

Regards,
Christian.

Re: [RFC PATCH] dma-fence: Fix races of fence callbacks versus destructors by locking

Reply via email to