On Fri, 06 Mar 2026 10:54:07 +0100 Philipp Stanner <[email protected]> wrote:
> On Fri, 2026-03-06 at 10:46 +0100, Boris Brezillon wrote: > > On Fri, 6 Mar 2026 09:10:52 +0100 > > Christian König <[email protected]> wrote: > > > > > On 3/5/26 16:12, Boris Brezillon wrote: > > > > Hi, > > > > > > > > On Thu, 5 Mar 2026 14:59:02 +0100 > > > > Christian König <[email protected]> wrote: > > > > > > > > > On 3/5/26 14:54, Philipp Stanner wrote: > > > > > > Yo Christian, > > > > > > > > > > > > a while ago we were discussing this problem > > > > > > > > > > > > dma_fence_set_error(f, -ECANCELED); > > > > > > > > If you really have two concurrent threads setting the error, this part > > > > is racy, though I can't think of any situation where concurrent > > > > signaling of a set of fences wouldn't be protected by another external > > > > lock. > > > > > > This is actually massively problematic and the reason why we have the > > > WARN_ON in dma_fence_set_error(). > > > > > > What drivers usually do is to disable the normal signaling path, e.g. > > > turn off interrupts for example, and then set and error and signal the > > > fence manually. > > > > > > The problem is that this has a *huge* potential for being racy, for > > > example when you tell the HW to not give you an interrupt any more it can > > > always been than interrupt processing has already started but wasn't able > > > yet to grab a lock or similar. > > > > > > I think we should start enforcing correct handling and have a lockdep > > > check in dma_fence_set_error() that the dma_fence lock is hold while > > > calling it. > > > > Sure, I don't mind you dropping the non-locked variants and forcing > > users to lock around set_error() + signal(). > > > > > > > > > > > dma_fence_signal(f); // racy! > > > > > > > > This is not racy because dma_fence_signal() takes/releases the > > > > lock internally. Besides, calling dma_fence_signal() on an already > > > > signaled fence is considered an invalid pattern if I trust the -EINVAL > > > > returned here[1]. > > > > > > No, that is also something we want to remove. IIRC Philip proposed some > > > patches to clean that up already. > > > > What do you mean? You want dma_fence_signal_locked() (or the variants > > of it) to not return an error when the fence is already signaled, > > > > Yes. That's already implemented: > > https://elixir.bootlin.com/linux/v7.0-rc1/source/drivers/dma-buf/dma-fence.c#L486 Okay, I guess I was looking at an older version of the code. My bad. > > > Reason being that > a) no one was ever checking that error code > b) you cannot *prevent* multiple signaling in C anyways > c) it's not even sure AFAICT whether signaling an already signaled > fence is even an error. The function will simply ignore the action. > It's not ideal design, sure, but what's the harm? The most important > fence rule is that fences do get eventually signaled. Firing WARN_ONs > or sth because you try to signal a signaled fence sounds bad to me, > because what's the issue? Fair enough. Not really questioning those changes to be honest, I'm just here to point that the rust abstraction will hopefully be immune to the stuff you're trying to protect against. > > > or > > you want to prevent this double-signal from happening. The plan for the > > rust abstraction is to do the latter. > > In Rust we sort of get that for free by having signal() consume the > fence. Exactly. > > > > > > > > > > > > > > > > > > > > > > > > > I think you mentioned that you are considering to redesign the > > > > > > dma_fence API so that users have to take the lock themselves to > > > > > > touch > > > > > > the fence: > > > > > > > > > > > > dma_fence_lock(f); > > > > > > dma_fence_set_error(f, -ECANCELED); > > > > > > dma_fence_signal(f); > > > > > > > > I guess you mean dma_fence_signal_locked(). > > > > > > > > > > dme_fence_unlock(f); > > > > > > > > > > > > > > > > > > Is that still up to date? Is there work in progress about that? > > > > > > > > > > > > > > > > It's on my "maybe if I ever have time for that" list, but yeah I > > > > > think it would be really nice to have and a great cleanup. > > > > > > > > > > We have a bunch of different functions which provide both a _locked() > > > > > and _unlocked() variant just because callers where to lazy to lock > > > > > the fence. > > > > > > > > > > Especially the dma_fence_signal function is overloaded 4 (!) times > > > > > with locked/unlocked and with and without timestamp functions. > > > > > > > > > > > I discovered that I might need / want that for the Rust > > > > > > abstractions. > > > > > > > > > > Well my educated guess is for Rust you only want the locked function > > > > > and never allow callers to be lazy. > > > > > > > > I don't think we have an immediate need for manual locking in rust > > > > drivers (no signaling done under an already dma_fence-locked section > > > > that I can think of), especially after the inline_lock you've > > > > introduced. Now, I don't think it matters if only the _locked() variant > > > > is exposed and the rust code is expected to acquire/release the lock > > > > manually, all I'm saying is that we probably don't need that in drivers > > > > (might be different if we start implementing fence containers like > > > > arrays and chain in rust, but I don't think we have an immediate need > > > > for that). > > > > > > Well as I wrote above you either have super reliable locking in your > > > signaling path or you will need that for error handling. > > > > Not really. With rust's ownership model, you can make it so only one > > thread gets to own the DriverFence (the signal-able fence object), > > > > Not strictly speaking. They can always stuff it into some locked > refcounted container. And that's my point: you've protected the container with some lock, and if the DriverFence is signaled under that lock, it goes away, meaning the other thread walking that very some container later on won't see it anymore. So yes, there are ways you can move DriverFence between threads (`Send` trait in rust), but there's no way rust will let you signal DriverFence objects concurrently (either you wrap it in a Lock that serializes accesses, or it will just refuse to compile).
