Just FYI I'm pulling this into drm-fixes straight as is, since if fixes the regression and avoids the revert, however please keep discussing until we are sure things are right, and we can deal with any fixes in a follow-up patch.
Dave. On Fri, 26 Jan 2024 at 03:32, Matthew Brost <[email protected]> wrote: > > On Thu, Jan 25, 2024 at 10:24:24AM +0100, Vlastimil Babka wrote: > > On 1/24/24 22:08, Matthew Brost wrote: > > > All entities must be drained in the DRM scheduler run job worker to > > > avoid the following case. An entity found that is ready, no job found > > > ready on entity, and run job worker goes idle with other entities + jobs > > > ready. Draining all ready entities (i.e. loop over all ready entities) > > > in the run job worker ensures all job that are ready will be scheduled. > > > > > > Cc: Thorsten Leemhuis <[email protected]> > > > Reported-by: Mikhail Gavrilov <[email protected]> > > > Closes: > > > https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuowtoeee+q26z...@mail.gmail.com/ > > > Reported-and-tested-by: Mario Limonciello <[email protected]> > > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124 > > > Link: > > > https://lore.kernel.org/all/[email protected]/ > > > Reported-by: Vlastimil Babka <[email protected]> > > > > Can change to Reported-and-tested-by: Vlastimil Babka <[email protected]> > > > > +1, got it. > > Matt > > > Thanks! > > > > > Closes: > > > https://lore.kernel.org/dri-devel/[email protected]/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a > > > Signed-off-by: Matthew Brost <[email protected]> > > > --- > > > drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++-------- > > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > index 550492a7a031..85f082396d42 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > @@ -1178,21 +1178,20 @@ static void drm_sched_run_job_work(struct > > > work_struct *w) > > > struct drm_sched_entity *entity; > > > struct dma_fence *fence; > > > struct drm_sched_fence *s_fence; > > > - struct drm_sched_job *sched_job; > > > + struct drm_sched_job *sched_job = NULL; > > > int r; > > > > > > if (READ_ONCE(sched->pause_submit)) > > > return; > > > > > > - entity = drm_sched_select_entity(sched); > > > + /* Find entity with a ready job */ > > > + while (!sched_job && (entity = drm_sched_select_entity(sched))) { > > > + sched_job = drm_sched_entity_pop_job(entity); > > > + if (!sched_job) > > > + complete_all(&entity->entity_idle); > > > + } > > > if (!entity) > > > - return; > > > - > > > - sched_job = drm_sched_entity_pop_job(entity); > > > - if (!sched_job) { > > > - complete_all(&entity->entity_idle); > > > return; /* No more work */ > > > - } > > > > > > s_fence = sched_job->s_fence; > > > > >
