On Tue, 2026-06-09 at 12:53 +0200, Christian König wrote:
> > 
> > // driver
> > dma_fence_signal(f); // revokes all accesses to our driver through 
> > backend_ops
> > // synchronize_rcu() now unnecessary \o/
> > cleanup(f); // We know that all accessors are gone
> > dma_fence_put(f);
> 
> Yeah and exactly that doesn't work.
> 
> Just think about the Nouveau case when you have your fences on a double 
> linked list.
> 
> When the fence lock is independent, e.g. have a separate lock for each fence 
> then this lock can't protect this double linked list.
> 
> So your cleanup path needs to take a lock which protects the list, but you 
> then run into lock inversion.


static bool nouveau_fence_is_signaled(struct dma_fence *f)
{
        struct nouveau_fence *fence = to_nouveau_fence(f);
        struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
        struct nouveau_channel *chan;
        bool ret = false;

        rcu_read_lock();
        chan = rcu_dereference(fence->channel);
        if (chan)
                ret = (int)(fctx->read(chan) - fence->base.seqno) >=
0;
        rcu_read_unlock();

        return ret;
}


AFAICT fctx->read() does not take f->lock. So where is the lock
inversion?


Again, ideally we can get to the point where no one except for the
fence subsystem itself has to take the lock manually anymore.

> 
> > > 
> > > So you are left with few options: Either the fence lock is external,
> > > which we don't want because that make the fence non-independent, or
> > > cleanup() defers work to irq_work or work_structs, which creates
> > > numerous lifetime issues.
> > 
> > Yup, this is uncool and we want to avoid that.
> > 
> > But these seem to be the options
> > 
> > 1. Ensure proper synchronization
> > 2. Wait for a grace period in a hot path
> > 3. Defer cleanup() with some delay mechanism
> > 
> > #1 is by far the cleanest approach. I still cannot see any downside,
> > and quite a few upsides.
> > 
> > https://elixir.bootlin.com/linux/v7.1-rc6/source/drivers/dma-buf/dma-fence.c#L1025
> > 
> > ^ is already racing with the signaled check.
> 
> Yeah so what? That is just an opportunistic check. 

What happens if someone signals the fence while the set_deadline()
callback is running?



P.

Reply via email to