Add the issue of a successor of drm_sched_resubmit_jobs() missing to the TODO file.
Signed-off-by: Philipp Stanner <[email protected]> --- drivers/gpu/drm/scheduler/TODO | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO index 79044adb7d01..713dd62c58da 100644 --- a/drivers/gpu/drm/scheduler/TODO +++ b/drivers/gpu/drm/scheduler/TODO @@ -10,3 +10,29 @@ - Tasks: 1. Read the example entry. 2. Remove the entry once solved (never in this case) + +* GPU job resubmits + - Difficulty: hard + - Contact: + - Christian König <[email protected]> + - Philipp Stanner <[email protected]> + - Description: + drm_sched_resubmit_jobs() is deprecated. Main reason being that it leads to + reinitializing dma_fences. See that function's docu for details. The better + approach for valid resubmissions by amdgpu and Xe is (apparently) to figure + out which job (and, through association: which entity) caused the hang. Then, + the job's buffer data, together with all other jobs' buffer data currently + in the same hardware ring, must be invalidated. This can for example be done + by overwriting it. + amdgpu currently determines which jobs are in the ring and need to be + overwritten by keeping copies of the job. Xe obtains that information by + directly accessing drm_sched's pending_list. + - Tasks: + 1. implement scheduler functionality through which + the driver can obtain the information which *broken* jobs are currently in + the hardware ring. + 2. Such infrastructure would then typically be used in + drm_sched_backend_ops.timedout_job(). Document that. + 3. Port a driver as first user. + 3. Document the new alternative in the docu of deprecated + drm_sched_resubmit_jobs(). -- 2.49.0
