Hi Iago and Tvrtko,

Thanks for the reviews!

On 05/03/26 08:21, Iago Toral wrote:
El jue, 05-03-2026 a las 10:50 +0000, Tvrtko Ursulin escribió:

On 05/03/2026 10:35, Iago Toral wrote:
El jue, 05-03-2026 a las 10:25 +0000, Tvrtko Ursulin escribió:

On 05/03/2026 10:18, Iago Toral wrote:
El jue, 05-03-2026 a las 09:34 +0000, Tvrtko Ursulin escribió:

On 05/03/2026 09:15, Iago Toral wrote:
El mar, 17-02-2026 a las 09:18 -0300, Maíra Canal escribió:
From: Tvrtko Ursulin <[email protected]>

To remove the file_priv NULL-ing dance needed to check if
the
file
descriptor is open, move the per-fd reset counter into
v3d_stats,
which
is heap-allocated and refcounted, outliving the fd as
long as
jobs
reference it.

This change allows the removal of the last `queue_lock`
usage
to
protect
`job->file_priv` and avoids possible NULL ptr dereference
issues
due
to
lifetime mismatches.

Also, to simplify locking, replace both the global and
per-fd
locked
reset counters with atomics.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Co-developed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
---
     drivers/gpu/drm/v3d/v3d_drv.c   | 20 ++++------------
----
     drivers/gpu/drm/v3d/v3d_drv.h   | 14 ++++----------
     drivers/gpu/drm/v3d/v3d_sched.c |  9 ++-------
     3 files changed, 10 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
b/drivers/gpu/drm/v3d/v3d_drv.c
index
aafb402c6ac3118a57df9fc0a0d21d35d48e3b2c..4e77f4808145df2
1746
ff4b
7058
089d0d161e3fc 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -110,13 +110,13 @@ static int
v3d_get_param_ioctl(struct
drm_device *dev, void *data,
                args->value =
!!drm_gem_get_huge_mnt(dev);
                return 0;
        case DRM_V3D_PARAM_GLOBAL_RESET_COUNTER:
-               mutex_lock(&v3d->reset_lock);
-               args->value = v3d->reset_counter;
-               mutex_unlock(&v3d->reset_lock);
+               args->value = atomic_read(&v3d-
reset_counter);

Don't we still need to take the reset lock here? Otherwise
there
would
be a chance that we read the counter while a reset is in
flight,
no?

I don't see that it would make a difference but maybe I am
not
seeing
your concern. It uses atomic_t so the increment versus read
is
fine.
Are
you maybe saying the v3d ABI guarantees reset is 100% done
(so
not in
progress, for some definition of progress, because hardware
reset
is
done by then, only re-submit and re-start of the software
state
is
poending) if userspace observes an increased global reset
counter?
That
would be surprising and I don't see how it could make a
practical
difference, but perhaps could be mitigated by moving the
atomic_inc
to
the end of v3d_gpu_reset_for_timeout(). Or still taking the
lock
as
you say.

My concern is just that it is possible for the query and the
reset
to
race and that I think it would make sense for the counter query
to
include in-flight resets (since what apps really care about is
whether
a GPU reset happened not if it completed the reset process).

Then there is no problem I think. Mutex lock or not, in both
cases it
is
not guaranteed reset either is not in progress at the time of the
ioctl.
Even if the ioctl does not return an increased counter perhaps
the
reset
handler is running but hasn't grabbed the mutex yet.



That's true, but then I wonder: what is the rationale for still
taking
the lock when resolving the DRM_V3D_PARAM_CONTEXT_RESET_COUNTER
query?

Good question and I asked myself the same this morning. For full
disclosure I wrote this patch back in Sep'25.. so between then and
now I
forgot a thing or two.

In the latest local branch that I can find I had it without the mutex
even for DRM_V3D_PARAM_CONTEXT_RESET_COUNTER. Maira, was there a
newer
version somewhere which I forgot about?

Mutex would make sense if there was any chance for paired jobs across
two engines to get reset at the same time. I think Maira was
explaining
to me that could be a possibility, maybe with some future rework.
Unless
I am confusing things.

In any case, in the current upstream v3d it indeed looks to be safe
with
no mutex.

Ok, thanks for checking the history here! Let's see if Maíra has
anything to add to this so we have the full picture and then we can
decide if we can also drop the remaining lock (and I guess in that case
the reset mutex too).

Initially, I was trying to make sure we had some consistency between
queues, so that the result for the user is a reliable snapshot from all
queues. So, for example, let's say we are iterating the loop and we
reached q=3 (CSD) and the BIN queue reset counter is incremented. In
such scenario, the user would get an outdated information.

If we take the lock, this scenario wouldn't happen, as the timeout hook
wouldn't be able to take the lock. Feel free to correct me if this train
of thought is mistaken (or if you believe this level of consistency is
not needed).

About removing the reset mutex, I don't believe it's possible, as we
need it to make sure that two queues are not resetting at the same time.
Also, I'll use this mutex (and the reset counter) in a future patch to
address redundant resets that happens when two queues are competing for
the `reset_lock`.

Best regards,
- Maíra


Reply via email to