El jue, 05-03-2026 a las 09:16 -0300, Maíra Canal escribió: > Hi Iago and Tvrtko, > > Thanks for the reviews! > > On 05/03/26 08:21, Iago Toral wrote: > > El jue, 05-03-2026 a las 10:50 +0000, Tvrtko Ursulin escribió: > > > > > > On 05/03/2026 10:35, Iago Toral wrote: > > > > El jue, 05-03-2026 a las 10:25 +0000, Tvrtko Ursulin escribió: > > > > > > > > > > On 05/03/2026 10:18, Iago Toral wrote: > > > > > > El jue, 05-03-2026 a las 09:34 +0000, Tvrtko Ursulin > > > > > > escribió: > > > > > > > > > > > > > > On 05/03/2026 09:15, Iago Toral wrote: > > > > > > > > El mar, 17-02-2026 a las 09:18 -0300, Maíra Canal > > > > > > > > escribió: > > > > > > > > > From: Tvrtko Ursulin <[email protected]> > > > > > > > > > > > > > > > > > > To remove the file_priv NULL-ing dance needed to > > > > > > > > > check if > > > > > > > > > the > > > > > > > > > file > > > > > > > > > descriptor is open, move the per-fd reset counter > > > > > > > > > into > > > > > > > > > v3d_stats, > > > > > > > > > which > > > > > > > > > is heap-allocated and refcounted, outliving the fd as > > > > > > > > > long as > > > > > > > > > jobs > > > > > > > > > reference it. > > > > > > > > > > > > > > > > > > This change allows the removal of the last > > > > > > > > > `queue_lock` > > > > > > > > > usage > > > > > > > > > to > > > > > > > > > protect > > > > > > > > > `job->file_priv` and avoids possible NULL ptr > > > > > > > > > dereference > > > > > > > > > issues > > > > > > > > > due > > > > > > > > > to > > > > > > > > > lifetime mismatches. > > > > > > > > > > > > > > > > > > Also, to simplify locking, replace both the global > > > > > > > > > and > > > > > > > > > per-fd > > > > > > > > > locked > > > > > > > > > reset counters with atomics. > > > > > > > > > > > > > > > > > > Signed-off-by: Tvrtko Ursulin > > > > > > > > > <[email protected]> > > > > > > > > > Co-developed-by: Maíra Canal <[email protected]> > > > > > > > > > Signed-off-by: Maíra Canal <[email protected]> > > > > > > > > > --- > > > > > > > > > drivers/gpu/drm/v3d/v3d_drv.c | 20 ++++------- > > > > > > > > > ----- > > > > > > > > > ---- > > > > > > > > > drivers/gpu/drm/v3d/v3d_drv.h | 14 ++++------- > > > > > > > > > --- > > > > > > > > > drivers/gpu/drm/v3d/v3d_sched.c | 9 ++------- > > > > > > > > > 3 files changed, 10 insertions(+), 33 > > > > > > > > > deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c > > > > > > > > > b/drivers/gpu/drm/v3d/v3d_drv.c > > > > > > > > > index > > > > > > > > > aafb402c6ac3118a57df9fc0a0d21d35d48e3b2c..4e77f480814 > > > > > > > > > 5df2 > > > > > > > > > 1746 > > > > > > > > > ff4b > > > > > > > > > 7058 > > > > > > > > > 089d0d161e3fc 100644 > > > > > > > > > --- a/drivers/gpu/drm/v3d/v3d_drv.c > > > > > > > > > +++ b/drivers/gpu/drm/v3d/v3d_drv.c > > > > > > > > > @@ -110,13 +110,13 @@ static int > > > > > > > > > v3d_get_param_ioctl(struct > > > > > > > > > drm_device *dev, void *data, > > > > > > > > > args->value = > > > > > > > > > !!drm_gem_get_huge_mnt(dev); > > > > > > > > > return 0; > > > > > > > > > case DRM_V3D_PARAM_GLOBAL_RESET_COUNTER: > > > > > > > > > - mutex_lock(&v3d->reset_lock); > > > > > > > > > - args->value = v3d->reset_counter; > > > > > > > > > - mutex_unlock(&v3d->reset_lock); > > > > > > > > > + args->value = atomic_read(&v3d- > > > > > > > > > > reset_counter); > > > > > > > > > > > > > > > > Don't we still need to take the reset lock here? > > > > > > > > Otherwise > > > > > > > > there > > > > > > > > would > > > > > > > > be a chance that we read the counter while a reset is > > > > > > > > in > > > > > > > > flight, > > > > > > > > no? > > > > > > > > > > > > > > I don't see that it would make a difference but maybe I > > > > > > > am > > > > > > > not > > > > > > > seeing > > > > > > > your concern. It uses atomic_t so the increment versus > > > > > > > read > > > > > > > is > > > > > > > fine. > > > > > > > Are > > > > > > > you maybe saying the v3d ABI guarantees reset is 100% > > > > > > > done > > > > > > > (so > > > > > > > not in > > > > > > > progress, for some definition of progress, because > > > > > > > hardware > > > > > > > reset > > > > > > > is > > > > > > > done by then, only re-submit and re-start of the software > > > > > > > state > > > > > > > is > > > > > > > poending) if userspace observes an increased global reset > > > > > > > counter? > > > > > > > That > > > > > > > would be surprising and I don't see how it could make a > > > > > > > practical > > > > > > > difference, but perhaps could be mitigated by moving the > > > > > > > atomic_inc > > > > > > > to > > > > > > > the end of v3d_gpu_reset_for_timeout(). Or still taking > > > > > > > the > > > > > > > lock > > > > > > > as > > > > > > > you say. > > > > > > > > > > > > My concern is just that it is possible for the query and > > > > > > the > > > > > > reset > > > > > > to > > > > > > race and that I think it would make sense for the counter > > > > > > query > > > > > > to > > > > > > include in-flight resets (since what apps really care about > > > > > > is > > > > > > whether > > > > > > a GPU reset happened not if it completed the reset > > > > > > process). > > > > > > > > > > Then there is no problem I think. Mutex lock or not, in both > > > > > cases it > > > > > is > > > > > not guaranteed reset either is not in progress at the time of > > > > > the > > > > > ioctl. > > > > > Even if the ioctl does not return an increased counter > > > > > perhaps > > > > > the > > > > > reset > > > > > handler is running but hasn't grabbed the mutex yet. > > > > > > > > > > > > > > > > > > That's true, but then I wonder: what is the rationale for still > > > > taking > > > > the lock when resolving the DRM_V3D_PARAM_CONTEXT_RESET_COUNTER > > > > query? > > > > > > Good question and I asked myself the same this morning. For full > > > disclosure I wrote this patch back in Sep'25.. so between then > > > and > > > now I > > > forgot a thing or two. > > > > > > In the latest local branch that I can find I had it without the > > > mutex > > > even for DRM_V3D_PARAM_CONTEXT_RESET_COUNTER. Maira, was there a > > > newer > > > version somewhere which I forgot about? > > > > > > Mutex would make sense if there was any chance for paired jobs > > > across > > > two engines to get reset at the same time. I think Maira was > > > explaining > > > to me that could be a possibility, maybe with some future rework. > > > Unless > > > I am confusing things. > > > > > > In any case, in the current upstream v3d it indeed looks to be > > > safe > > > with > > > no mutex. > > > > Ok, thanks for checking the history here! Let's see if Maíra has > > anything to add to this so we have the full picture and then we can > > decide if we can also drop the remaining lock (and I guess in that > > case > > the reset mutex too). > > Initially, I was trying to make sure we had some consistency between > queues, so that the result for the user is a reliable snapshot from > all > queues. So, for example, let's say we are iterating the loop and we > reached q=3 (CSD) and the BIN queue reset counter is incremented. In > such scenario, the user would get an outdated information.
I am not sure that we strictly need to handle this scenario, but I am okay with keeping the lock for this. Maybe we can add a small comment explaining the rationale. With that patch 5 is also: Reviewed-by: Iago Toral Quiroga <[email protected]> > > If we take the lock, this scenario wouldn't happen, as the timeout > hook > wouldn't be able to take the lock. Feel free to correct me if this > train > of thought is mistaken (or if you believe this level of consistency > is > not needed). > > About removing the reset mutex, I don't believe it's possible, as we > need it to make sure that two queues are not resetting at the same > time. > Also, I'll use this mutex (and the reset counter) in a future patch > to > address redundant resets that happens when two queues are competing > for > the `reset_lock`. Makes sense, thanks Maíra. Iago > > Best regards, > - Maíra > > >
