On Wed, 12 Nov 2025 14:48:51 +0100 Philipp Stanner <[email protected]> wrote:
> On Wed, 2025-11-12 at 14:31 +0100, Boris Brezillon wrote: > > Hi Philipp, > > > > On Wed, 12 Nov 2025 13:38:00 +0100 > > Philipp Stanner <[email protected]> wrote: > > > > > On Wed, 2025-11-12 at 13:17 +0100, Boris Brezillon wrote: > > > > From: Ashley Smith <[email protected]> > > > > > > > > The timeout logic provided by drm_sched leads to races when we try > > > > to suspend it while the drm_sched workqueue queues more jobs. Let's > > > > overhaul the timeout handling in panthor to have our own delayed work > > > > that's resumed/suspended when a group is resumed/suspended. When an > > > > actual timeout occurs, we call drm_sched_fault() to report it > > > > through drm_sched, still. But otherwise, the drm_sched timeout is > > > > disabled (set to MAX_SCHEDULE_TIMEOUT), which leaves us in control of > > > > how we protect modifications on the timer. > > > > > > > > One issue seems to be when we call drm_sched_suspend_timeout() from > > > > both queue_run_job() and tick_work() which could lead to races due to > > > > drm_sched_suspend_timeout() not having a lock. Another issue seems to > > > > be in queue_run_job() if the group is not scheduled, we suspend the > > > > timeout again which undoes what drm_sched_job_begin() did when calling > > > > drm_sched_start_timeout(). So the timeout does not reset when a job > > > > is finished. > > > > > > > > > > > > > > […] > > > > > > > + > > > > +static void > > > > +queue_reset_timeout_locked(struct panthor_queue *queue) > > > > +{ > > > > + lockdep_assert_held(&queue->fence_ctx.lock); > > > > + > > > > + if (queue->timeout.remaining != MAX_SCHEDULE_TIMEOUT) { > > > > + mod_delayed_work(queue->scheduler.timeout_wq, > > > > > > Here you are interfering with the scheduler's internals again, don't > > > you. I think we agreed that we don't want to do such things anymore, > > > didn't we? > > > > We're not really touching drm_gpu_scheduler's internals, we're just > > retrieving the timeout workqueue we passed at init time: > > panthor_queue::timeout is panthor internals not drm_sched internals. > > > > This being said, I agree we should use ptdev->reset.wq instead of > > retrieving the timeout workqueue through the drm_gpu_scheduler object. > > > > Just to be clear, the goal of this patch is to bypass the > > drm_gpu_scheduler timeout logic entirely, so we can have our own thing > > that's not racy (see below). > > OK. timeout_wq sharing is intended and allowed, so if that's what > you're doing, good. But I agree that accessing the wq through the > driver's struct is then cleaner and more obviously correct. Will do. > > > > > > > > > You could write a proper drm_sched API function which serves your > > > usecase. > > > > It's not really lack of support for our usecase that drives this > > change, but more the fact the current helpers are racy for drivers that > > have a 1:1 entity:sched relationship with queues that can be scheduled > > out behind drm_gpu_scheduler's back. > > And you also can't stop drm_sched to prevent races? That's the thing, I don't want to stop the drm_gpu_scheduler attached to a panthor_queue, I want new jobs to be queued to the ring buffer until this ring buffer is full (which is controller with the ::credit_limit property), even if the group this queue belongs to is not currently active on the FW side. Those jobs will get executed at some later point when the group gets picked by the panthor scheduler. > > > > > > > > > Or could maybe DRM_GPU_SCHED_STAT_NO_HANG be returned from your driver > > > in case an interrupt actually fires? > > > > I don't think it helps, see below. > > > > > > > > > + &queue->timeout.work, > > > > + msecs_to_jiffies(JOB_TIMEOUT_MS)); > > > > + } > > > > +} > > > > + > > > > +static bool > > > > +group_can_run(struct panthor_group *group) > > > > +{ > > > > + return group->state != PANTHOR_CS_GROUP_TERMINATED && > > > > + group->state != PANTHOR_CS_GROUP_UNKNOWN_STATE && > > > > + !group->destroyed && group->fatal_queues == 0 && > > > > + !group->timedout; > > > > +} > > > > + > > > > > > > > > > […] > > > > > > > +queue_suspend_timeout(struct panthor_queue *queue) > > > > +{ > > > > + spin_lock(&queue->fence_ctx.lock); > > > > + queue_suspend_timeout_locked(queue); > > > > + spin_unlock(&queue->fence_ctx.lock); > > > > +} > > > > + > > > > +static void > > > > +queue_resume_timeout(struct panthor_queue *queue) > > > > +{ > > > > + spin_lock(&queue->fence_ctx.lock); > > > > + > > > > + if (queue_timeout_is_suspended(queue)) { > > > > + mod_delayed_work(queue->scheduler.timeout_wq, > > > > > > There is drm_sched_resume_timeout(). Why can it not be used? > > > > Because it's racy. I don't remember all the details, but IIRC, it had > > to do with job insertion calling drm_sched_start_timeout() while we're > > calling drm_sched_resume_timeout() from cs_slot_reset_locked(). We > > tried to make it work, but we gave up at some point and went for a > > driver-specific timer, because ultimately what we want is a per-queue > > timeout that we can pause/resume without drm_sched interfering when new > > jobs are queued to our ring buffers: we resume when the execution > > context (AKA group) is scheduled in, and we pause when this execution > > context is scheduled out. > > > > That's the very reason we set drm_gpu_scheduler::timeout to > > MAX_SCHEDULE_TIMEOUT at init time (AKA, timeout disabled) and never > > touch that again. When our driver-internal timer expires, we forward > > the information to the drm_sched layer by calling drm_sched_fault(). > > That sounds all.. stressful ;) Yeah, it's not ideal :-/. > > As you know I only learned a few weeks ago about your group scheduler > on top of drm_sched. I wish I had heard about it when it was > implemented; we might have come up with the idea for drm_jobqueue > sooner. Might have simplified things, I guess, but that's life, and I'm happy to transition to drm_jobqueue when it's deemed ready. > > > > > > > > > > + &queue->timeout.work, > > > > + queue->timeout.remaining); > > > > + > > > > + queue->timeout.remaining = MAX_SCHEDULE_TIMEOUT; > > > > + } > > > > + > > > > + spin_unlock(&queue->fence_ctx.lock); > > > > +} > > > > + > > > > > > > > > > […] > > > > > > > > > > > @@ -3270,6 +3379,11 @@ queue_timedout_job(struct drm_sched_job > > > > *sched_job) > > > > > > > > queue_start(queue); > > > > > > > > + /* We already flagged the queue as faulty, make sure we don't > > > > get > > > > + * called again. > > > > + */ > > > > + queue->scheduler.timeout = MAX_SCHEDULE_TIMEOUT; > > > > + > > > > return DRM_GPU_SCHED_STAT_RESET; > > > > > > DRM_GPU_SCHED_STAT_NO_HANG instead of just modifying the scheduler's > > > internal data?? > > > > > > queue->scheduler.timeout = MAX_SCHEDULE_TIMEOUT; > > > > is a leftover from a previous version. We shouldn't have to modify that > > here because the timeout is initialized to MAX_SCHEDULE_TIMEOUT and > > never touched again. > > So you agree that it can be removed in v8? Yep, I think it can go away. I'll make sure it still works without it, and also get the wq from some driver fields instead of going back to drm_gpu_scheduler::timeout_wq.
