On Mon, 2026-06-08 at 20:47 +0200, Christian König wrote:
> On 6/8/26 20:39, Danilo Krummrich wrote:
> > On Mon Jun 8, 2026 at 8:32 PM CEST, Christian König wrote:
> > > On 6/8/26 19:59, Danilo Krummrich wrote:
> > > > On Mon Jun 8, 2026 at 7:34 PM CEST, Christian König wrote:
> > > > > That's why we need the RCU grace period to make sure that nobody is
> > > > > referencing the driver stuff any more.
> > > > 
> > > > Right, and that's what Philipp tries to address, the requirement to 
> > > > wait for an
> > > > RCU grace period is perfectly fine if it is only about freeing memory, 
> > > > but it
> > > > can become painful if the fence private data contains data also needs 
> > > > to be
> > > > destructed in some way.
> > > 
> > > Yeah that makes sense.
> > > 
> > > > IOW, if a driver signals a fence, it is lifecycle-wise reasonable to 
> > > > destruct
> > > > the private data that is no longer needed (remaining users only deal 
> > > > with struct
> > > > dma_fence) and having to wait for a full grace period adds sublety and
> > > > complication that can be avoided with the proposed approach.
> > > 
> > > Yeah, I've run into that when I tried to make the amdgpu fences 
> > > independent as well.
> > > > That said, I'd like to ask the opposite question: What are the concerns 
> > > > with the
> > > > proposed approach over (pure) RCU?
> > > 
> > > Well a) locking inversions and b) performance.
> > > 
> > > For example the reason why we have the dma_fence_is_signaled() and
> > > dma_fence_is_signaled_locked() variants is because there is a measurable
> > > difference in some specific use cases for not grabbing the locks.
> > 
> > I checked for this as well, but couldn't find a case where
> > dma_fence_is_signaled() is used in a way where it would be performance 
> > critical
> > to avoid the lock in any way.
> > 
> > Note that the lock is only bypassed when the fence is signaled already (this
> > would be preserved) and if signaled() returns false, i.e. dma_fence_signal()
> > will take the lock anyways.
> > 
> > > I personally find those micro-optimizations rather questionable, but the
> > > community agreement is that we should have them.
> > 
> > I agree, it is rather questionable. So, I wouldn't make this the deciding 
> > factor
> > unless someone can present a valid case where it actually matters.
> > 
> > > So my take would rather be that the dma_fence_is_signaled_locked() variant
> > > goes away and we consistently call the ops pointers without holding the
> > > dma_fence lock and the driver implementations can then optionally take it 
> > > if
> > > necessary.
> > 
> > How did you get to this conclusion considering that you run into what I
> > mentioned above as well and the fact that we seem to agree that the 
> > performance
> > concern is rather questionable?
> 
> Quite simple, it's the cleaner approach.

Depends on the definition of "clean". Considering the scheduler
disaster, I learned that there is nothing cleaner than locking. It
provides perfect synchronization. It would eliminate the race that we
see for the Rust abstractions, and likely also the one that you have
seen.

The more of the heavy lifting the API does, the less likely it becomes
that drivers even start to interfer with your API internals, like
taking your internal locks to work around race conditions in the API
(talking about drm_sched again).

> 
> Calling callbacks with locks held is rather questionable even putting the 
> performance issue aside.

We already see that a few drivers have to take the lock anyways, which
informs that it's probably necessary to take it for the racy conditions
this patch desires to address.

> 
> In detail calling the callbacks without holding locks allows all 
> implementations who need it to explicitly take locks in the order they want.

Didn't you say a few mails above that the implementation should not use
the fence lock for its own purposes?


P.

> 
> If you call it with the lock held you enforce the fence lock the be the 
> outermost lock.
> 
> Regards,
> Christian.

Reply via email to