On 12/4/25 17:04, Alex Deucher wrote: > On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <[email protected]> wrote: >> >> +Cc Alex, Christian, Danilo >> >> >> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote: >>> Stop open coding pending job list in drivers. Add pending job list >>> iterator which safely walks DRM scheduler list asserting DRM scheduler >>> is stopped. >>> >>> v2: >>> - Fix checkpatch (CI) >>> v3: >>> - Drop locked version (Christian) >>> v4: >>> - Reorder patch (Niranjana) >> >> Same with the changelog. >> >>> >>> Signed-off-by: Matthew Brost <[email protected]> >>> Reviewed-by: Niranjana Vishwanathapura <[email protected]> >>> --- >>> include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 50 insertions(+) >>> >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>> index 385bf34e76fe..9d228513d06c 100644 >>> --- a/include/drm/gpu_scheduler.h >>> +++ b/include/drm/gpu_scheduler.h >>> @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct >>> drm_sched_job *job) >>> dma_fence_is_signaled(&s_fence->finished); >>> } >>> >>> +/** >>> + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator >>> state >>> + * @sched: DRM scheduler associated with pending job iterator >>> + */ >>> +struct drm_sched_pending_job_iter { >>> + struct drm_gpu_scheduler *sched; >>> +}; >>> + >>> +/* Drivers should never call this directly */ >>> +static inline struct drm_sched_pending_job_iter >>> +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched) >>> +{ >>> + struct drm_sched_pending_job_iter iter = { >>> + .sched = sched, >>> + }; >>> + >>> + WARN_ON(!drm_sched_is_stopped(sched)); >>> + return iter; >>> +} >>> + >>> +/* Drivers should never call this directly */ >>> +static inline void >>> +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter >>> iter) >>> +{ >>> + WARN_ON(!drm_sched_is_stopped(iter.sched)); >>> +} >>> + >>> +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter, >>> + __drm_sched_pending_job_iter_end(_T), >>> + __drm_sched_pending_job_iter_begin(__sched), >>> + struct drm_gpu_scheduler *__sched); >>> +static inline void * >>> +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t >>> *_T) >>> +{ return _T; } >>> +#define class_drm_sched_pending_job_iter_is_conditional false >>> + >>> +/** >>> + * drm_sched_for_each_pending_job() - Iterator for each pending job in >>> scheduler >>> + * @__job: Current pending job being iterated over >>> + * @__sched: DRM scheduler to iterate over pending jobs >>> + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter >>> + * >>> + * Iterator for each pending job in scheduler, filtering on an entity, and >>> + * enforcing scheduler is fully stopped >>> + */ >>> +#define drm_sched_for_each_pending_job(__job, __sched, __entity) >>> \ >>> + scoped_guard(drm_sched_pending_job_iter, (__sched)) >>> \ >>> + list_for_each_entry((__job), &(__sched)->pending_list, list) >>> \ >>> + for_each_if(!(__entity) || (__job)->entity == >>> (__entity)) >>> + >>> #endif >> >> >> See my comments in the first patch. The docu doesn't mention at all why >> this new functionality exists and when and why users would be expected >> to use it. >> >> As far as I remember from XDC, both AMD and Intel overwrite a timed out >> jobs buffer data in the rings on GPU reset. To do so, the driver needs >> the timedout job (passed through timedout_job() callback) and then >> needs all the pending non-broken jobs. >> >> AFAICS your patch provides a generic iterator over the entire >> pending_list. How is a driver then supposed to determine which are the >> non-broken jobs (just asking, but that needs to be documented)? >> >> Could it make sense to use a different iterator which only returns jobs >> of not belonging to the same context as the timedout-one? >> >> Those are important questions that need to be addressed before merging >> that. >> >> And if this works canonically (i.e., for basically everyone), it needs >> to be documented in drm_sched_resubmit_jobs() that this iterator is now >> the canonical way of handling timeouts. >> >> Moreover, btw, just yesterday I added an entry to the DRM todo list >> which addresses drm_sched_resubmit_jobs(). If we merge this, that entry >> would have to be removed, too. >> >> >> @AMD: Would the code Matthew provides work for you? Please give your >> input. This is very important common infrastructure. > > I don't think drm_sched_resubmit_jobs() can work for us without major > rework. For our kernel queues, we have a single queue on which jobs > for different clients are scheduled. When we reset the queue, we lose > all jobs on the queue and have to re-emit the non-guilty ones. We do > this at the ring level, i.e., we save the packets directly from the > ring and then re-emit the packets for the non-guilty contexts to the > freshly reset ring. This avoids running run_job() again which would > issue new fences and race with memory management, etc. > > I think the following would be workable: > 1. driver job_timedout() callback flags the job as bad. resets the bad > queue, and calls drm_sched_resubmit_jobs() > 2. drm_sched_resubmit_jobs() walks the pending list and calls > run_job() for every job
Calling run_job() multiple times was one of the worst ideas I have ever seen. The problem here is that you need a transactional approach to the internal driver state which is modified by ->run_job(). So for example if you have: ->run_job(A) ->run_job(B) ->run_job(C) And after a reset you find that you need to re-submit only job B and A & C are filtered then that means that your driver state needs to get back before running job A. > 2. driver run_job() callback looks to see if we already ran this job > and uses the original fence rather than allocating a new one Nope, the problem is *all* drivers *must* use the original fence. Otherwise you always run into trouble. We should not promote a driver interface which makes it extremely easy to shoot down the whole system. > 3. driver run_job() callback checks to see if the job is guilty or > from the same context and if so, sets an error on the fences and > submits only the fence packet to the queue so that any follow up jobs > will properly synchronize if they need to wait on the fence from the > bad job. > 4. driver run_job() callback will submit the full packet stream for > non-guilty contexts > > I guess we could use the iterator and implement that logic in the > driver directly rather than using drm_sched_resubmit_jobs(). Yeah, exactly that's the way to go. Christian. > > Alex
