Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
to consider:

Pre-existing issues:
- [High] The `run_job` callbacks return `NULL` instead of propagating the error 
pointer when `v3d_fence_create()` fails, masking submission failures and 
causing the DRM scheduler to treat the job as successfully completed.
--

commit f3ab1695e1c90559d6118895b31f3b02289aee81
Author: Maíra Canal <[email protected]>

drm/v3d: Clear queue->active_job when v3d_fence_create() fails

This commit clears queue->active_job in the run_job callbacks for BIN, RENDER,
TFU, and CSD when v3d_fence_create() fails. It creates a failure path to
prevent dangling pointers when returning from a failed job submission.

> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 0e266b29317f3..46f4fc09c59ec 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
[ ... ]
> @@ -214,7 +210,7 @@ static struct dma_fence *v3d_bin_job_run(struct 
> drm_sched_job *sched_job)
>  
>       fence = v3d_fence_create(v3d, V3D_BIN);
>       if (IS_ERR(fence))
> -             return NULL;
> +             goto out_clean_job;
[ ... ]
> @@ -242,6 +238,12 @@ static struct dma_fence *v3d_bin_job_run(struct 
> drm_sched_job *sched_job)
>       V3D_CORE_WRITE(0, V3D_CLE_CT0QEA, job->end);
>  
>       return fence;
> +
> +out_clean_job:
> +     spin_lock_irqsave(&queue->queue_lock, irqflags);
> +     queue->active_job = NULL;
> +     spin_unlock_irqrestore(&queue->queue_lock, irqflags);
> +     return NULL;

[Severity: High]
This is a pre-existing issue, but does returning NULL here mask the
submission failure from the DRM scheduler?

When v3d_fence_create() fails, it returns an error pointer. If the
v3d_bin_job_run() callback (and the other run_job callbacks modified
in this patch) returns NULL instead of propagating this error pointer,
the DRM scheduler treats the job as having completed synchronously and
successfully.

Could this allow dependent jobs to execute as if the hardware successfully
processed the command lists, potentially leading to GPU page faults or
userspace crashes? Should the callbacks propagate the error pointer back
to the scheduler instead?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=2

Reply via email to