On Tue, 14 Oct 2025 at 01:11, Christian König
<[email protected]> wrote:
>
> Hi everyone,
>
> dma_fences have ever lived under the tyranny dictated by the module
> lifetime of their issuer, leading to crashes should anybody still holding
> a reference to a dma_fence when the module of the issuer was unloaded.
>
> But those days are over! The patch set following this mail finally
> implements a way for issuers to release their dma_fence out of this
> slavery and outlive the module who originally created them.
>
> Previously various approaches have been discussed, including changing the
> locking semantics of the dma_fence callbacks (by me) as well as using the
> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
> from their actual users.
>
> Changing the locking semantics turned out to be much more trickier than
> originally thought because especially on older drivers (nouveau, radeon,
> but also i915) this locking semantics is actually needed for correct
> operation.
>
> Using the drm_scheduler as intermediate layer is still a good idea and
> should probably be implemented to make live simpler for some drivers, but
> doesn't work for all use cases. Especially TLB flush fences, preemption
> fences and userqueue fences don't go through the drm scheduler because it
> doesn't make sense for them.
>
> Tvrtko did some really nice prerequisite work by protecting the returned
> strings of the dma_fence_ops by RCU. This way dma_fence creators where
> able to just wait for an RCU grace period after fence signaling before
> they could be save to free those data structures.
>
> Now this patch set here goes a step further and protects the whole
> dma_fence_ops structure by RCU, so that after the fence signals the
> pointer to the dma_fence_ops is set to NULL when there is no wait nor
> release callback given. All functionality which use the dma_fence_ops
> reference are put inside an RCU critical section, except for the
> deprecated issuer specific wait and of course the optional release
> callback.
>
> Additional to the RCU changes the lock protecting the dma_fence state
> previously had to be allocated external. This set here now changes the
> functionality to make that external lock optional and allows dma_fences
> to use an inline lock and be self contained.
>
> The new approach is then applied to amdgpu allowing the module to be
> unloaded even when dma_fences issued by it are still around.

Can we add some Why? in here, like what use cases does this enable,

Some more explanation about what these hanging about fences will be
used in, like if the module is gone away, I have to assume this is for
already signalled fences, so someone is waiting and hasn't cleaned up
yet?

What problem does it solve wrt module unload, what scenario is
unloading amdgpu not possible in now, what scenario will it be able to
unload in after?

Thanks,

Dave.

Reply via email to