Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
Pre-existing issues: - [Critical] Error path cleans up DRM scheduler jobs that have already been pushed, causing Use-After-Free and scheduler corruption. - [Critical] Uninitialized locking context is passed to fini on the success path for CPU jobs without BOs, leading to a kernel panic. - [High] BO reservation locks acquired for `clean_job` are leaked if subsequent validation fails. - [High] Userspace can trigger a kernel deadlock by specifying the same BO handle in both the CPU job and the CSD job. - [High] GFP_KERNEL allocation while holding dma_resv locks causes an AB-BA deadlock with memory reclaim. -- commit e7875ac22f6ebc1c0172f365953ce0df35e9465b Author: Maíra Canal <[email protected]> drm/v3d: Make v3d_get_cpu_indirect_csd_params() a pure parser This commit modifies v3d_get_cpu_indirect_csd_params() to only parse the indirect CSD extension and stash the arguments. The creation of CSD and clean jobs, along with their BO locking, is moved into v3d_submit_cpu_ioctl() to prepare for the unified submission chain. > diff --git a/drivers/gpu/drm/v3d/v3d_submit.c > b/drivers/gpu/drm/v3d/v3d_submit.c > index fcaf3a6cfddc6..64eba912dc646 100644 > --- a/drivers/gpu/drm/v3d/v3d_submit.c > +++ b/drivers/gpu/drm/v3d/v3d_submit.c [ ... ] > @@ -1404,6 +1403,17 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void > *data, > goto fail; > } > > + if (cpu_job->job_type == V3D_CPU_JOB_TYPE_INDIRECT_CSD) { > + ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, > + &cpu_job->indirect_csd.args, > + &cpu_job->indirect_csd.job, > + > &cpu_job->indirect_csd.clean_job, > + NULL, > + &cpu_job->indirect_csd.exec); > + if (ret) > + goto fail; > + } > + [Severity: High] This isn't a bug introduced by this patch, but does this create a lock leak on error paths? If v3d_setup_csd_jobs_and_bos() successfully locks BO reservations using cpu_job->indirect_csd.exec, and later in v3d_submit_cpu_ioctl() a failure occurs (for example, in v3d_lookup_bos() or v3d_lock_bo_reservations()), the error handling jumps to fail: v3d_submit_cpu_ioctl() { ... if (ret) goto fail; ... } This appears to bypass the fail_unreserve: label, leaving the ww_mutex and dma_resv locks permanently held. Could this cause subsequent GPU operations attempting to use these BOs to stall indefinitely? [Severity: High] This is a pre-existing issue, but could this lead to a deadlock if userspace specifies the same BO handle in both the CPU job and the CSD job? When v3d_submit_cpu_ioctl() uses two separate locking contexts in the same thread, the first context acquires the WW mutexes for the CSD job's BOs: ret = v3d_setup_csd_jobs_and_bos(..., &cpu_job->indirect_csd.exec); The second locking context then attempts to acquire WW mutexes for the CPU job's BOs: ret = v3d_lock_bo_reservations(&cpu_job->base, &acquire_ctx); If the same BO is specified in both jobs, would the second context (younger transaction) block indefinitely attempting to acquire a WW mutex already held by the first context, leading to an unrecoverable self-deadlock? [Severity: Critical] This is a pre-existing issue, but is it safe to clean up DRM scheduler jobs that have already been pushed? In v3d_submit_cpu_ioctl(), once jobs are pushed to the DRM scheduler via v3d_push_job(), the scheduler takes ownership of their lifecycle. If a subsequent operation fails: ret = drm_sched_job_add_dependency(&clean_job->base, ...); if (ret) goto fail_unreserve; The error path jumps to fail_unreserve: and then fail:, calling v3d_job_cleanup() on the already-pushed jobs. Does calling drm_sched_job_cleanup() on a pushed job destroy its s_fence and dependencies xarray while the background scheduler thread might still be accessing them, potentially causing a use-after-free and scheduler state corruption? [Severity: Critical] This isn't a bug introduced by this patch, but is it possible for an uninitialized locking context to be passed to the cleanup functions? In v3d_submit_cpu_ioctl(), the local locking context is bypassed if the CPU job has no BOs: if (args->bo_handle_count) { ... ret = v3d_lock_bo_reservations(&cpu_job->base, &acquire_ctx); } The function then proceeds to the success path and calls: drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx); Could this result in passing uninitialized stack memory to drm_exec_fini() or ww_acquire_fini(), causing the cleanup function to dereference and free garbage pointers? [Severity: High] This is a pre-existing issue, but does this allocation trigger an AB-BA deadlock with memory reclaim? In v3d_submit_cpu_ioctl(), dependencies are added while BOs are locked: ret = drm_sched_job_add_dependency(&csd_job->base.base, dma_fence_get(cpu_job->base.done_fence)); This internally uses xa_alloc() with the GFP_KERNEL flag while the thread holds dma_resv locks acquired by v3d_lock_bo_reservations(). If system memory is tight, could the memory allocator trigger the shrinker, which would attempt to evict GEM objects and acquire dma_resv locks, leading to a deadlock? > clean_job = cpu_job->indirect_csd.clean_job; > csd_job = cpu_job->indirect_csd.job; > -- Sashiko AI review · https://sashiko.dev/#/patchset/[email protected]?part=9
