On 05/03/2026 12:16, Maíra Canal wrote:
Hi Iago and Tvrtko,
Thanks for the reviews!
On 05/03/26 08:21, Iago Toral wrote:
El jue, 05-03-2026 a las 10:50 +0000, Tvrtko Ursulin escribió:
On 05/03/2026 10:35, Iago Toral wrote:
El jue, 05-03-2026 a las 10:25 +0000, Tvrtko Ursulin escribió:
On 05/03/2026 10:18, Iago Toral wrote:
El jue, 05-03-2026 a las 09:34 +0000, Tvrtko Ursulin escribió:
On 05/03/2026 09:15, Iago Toral wrote:
El mar, 17-02-2026 a las 09:18 -0300, Maíra Canal escribió:
From: Tvrtko Ursulin <[email protected]>
To remove the file_priv NULL-ing dance needed to check if
the
file
descriptor is open, move the per-fd reset counter into
v3d_stats,
which
is heap-allocated and refcounted, outliving the fd as
long as
jobs
reference it.
This change allows the removal of the last `queue_lock`
usage
to
protect
`job->file_priv` and avoids possible NULL ptr dereference
issues
due
to
lifetime mismatches.
Also, to simplify locking, replace both the global and
per-fd
locked
reset counters with atomics.
Signed-off-by: Tvrtko Ursulin <[email protected]>
Co-developed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
---
drivers/gpu/drm/v3d/v3d_drv.c | 20 ++++------------
----
drivers/gpu/drm/v3d/v3d_drv.h | 14 ++++----------
drivers/gpu/drm/v3d/v3d_sched.c | 9 ++-------
3 files changed, 10 insertions(+), 33 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
b/drivers/gpu/drm/v3d/v3d_drv.c
index
aafb402c6ac3118a57df9fc0a0d21d35d48e3b2c..4e77f4808145df2
1746
ff4b
7058
089d0d161e3fc 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -110,13 +110,13 @@ static int
v3d_get_param_ioctl(struct
drm_device *dev, void *data,
args->value =
!!drm_gem_get_huge_mnt(dev);
return 0;
case DRM_V3D_PARAM_GLOBAL_RESET_COUNTER:
- mutex_lock(&v3d->reset_lock);
- args->value = v3d->reset_counter;
- mutex_unlock(&v3d->reset_lock);
+ args->value = atomic_read(&v3d-
reset_counter);
Don't we still need to take the reset lock here? Otherwise
there
would
be a chance that we read the counter while a reset is in
flight,
no?
I don't see that it would make a difference but maybe I am
not
seeing
your concern. It uses atomic_t so the increment versus read
is
fine.
Are
you maybe saying the v3d ABI guarantees reset is 100% done
(so
not in
progress, for some definition of progress, because hardware
reset
is
done by then, only re-submit and re-start of the software
state
is
poending) if userspace observes an increased global reset
counter?
That
would be surprising and I don't see how it could make a
practical
difference, but perhaps could be mitigated by moving the
atomic_inc
to
the end of v3d_gpu_reset_for_timeout(). Or still taking the
lock
as
you say.
My concern is just that it is possible for the query and the
reset
to
race and that I think it would make sense for the counter query
to
include in-flight resets (since what apps really care about is
whether
a GPU reset happened not if it completed the reset process).
Then there is no problem I think. Mutex lock or not, in both
cases it
is
not guaranteed reset either is not in progress at the time of the
ioctl.
Even if the ioctl does not return an increased counter perhaps
the
reset
handler is running but hasn't grabbed the mutex yet.
That's true, but then I wonder: what is the rationale for still
taking
the lock when resolving the DRM_V3D_PARAM_CONTEXT_RESET_COUNTER
query?
Good question and I asked myself the same this morning. For full
disclosure I wrote this patch back in Sep'25.. so between then and
now I
forgot a thing or two.
In the latest local branch that I can find I had it without the mutex
even for DRM_V3D_PARAM_CONTEXT_RESET_COUNTER. Maira, was there a
newer
version somewhere which I forgot about?
Mutex would make sense if there was any chance for paired jobs across
two engines to get reset at the same time. I think Maira was
explaining
to me that could be a possibility, maybe with some future rework.
Unless
I am confusing things.
In any case, in the current upstream v3d it indeed looks to be safe
with
no mutex.
Ok, thanks for checking the history here! Let's see if Maíra has
anything to add to this so we have the full picture and then we can
decide if we can also drop the remaining lock (and I guess in that case
the reset mutex too).
Initially, I was trying to make sure we had some consistency between
queues, so that the result for the user is a reliable snapshot from all
queues. So, for example, let's say we are iterating the loop and we
reached q=3 (CSD) and the BIN queue reset counter is incremented. In
such scenario, the user would get an outdated information.
If we take the lock, this scenario wouldn't happen, as the timeout hook
wouldn't be able to take the lock. Feel free to correct me if this train
of thought is mistaken (or if you believe this level of consistency is
not needed).
From the userspace point of view it is the same as if the querying
thread got scheduled a smidgen earlier, managed to grab the mutex first,
and still got the old count, no? Hang and reset might have been in the
progress already, just the worker did not run yet. Or that it will only
see it one the next query. The ordering of these queries and job
submission is undefined anyway so they cannot be used in any ordered way.
The only scenario where I see mutex could be relevant is if there was
scope for more than one queue reset count to increment in the same reset
handler invocation. But at the moment we don't have that. Or if there
was some hidden guarantee that the count increase must only be observed
if all processing has been completely done, including re-submitting the
jobs and restarting the scheduler. But that would be surprising and I
don't see that userspace and driver would or should ever care about that
internal implementation details.
Am I missing something? If I am not it is still to have the mutex, but I
would suggest it would be better to remove so it is not confusing why
one path has it and the other does not.
Regards,
Tvrtko
About removing the reset mutex, I don't believe it's possible, as we
need it to make sure that two queues are not resetting at the same time.
Also, I'll use this mutex (and the reset counter) in a future patch to
address redundant resets that happens when two queues are competing for
the `reset_lock`.
Best regards,
- Maíra