Thank you for your contribution! Sashiko AI review found 3 potential issue(s) 
to consider:

Pre-existing issues:
- [High] Calling `drm_sched_job_cleanup()` after `drm_sched_job_arm()` violates 
the DRM scheduler API contract.
- [High] The runtime suspend callback fails to synchronize the shared IRQ 
before powering down.
- [High] Potential NULL pointer dereference in `iommu_attach_group()` if the 
device lacks an IOMMU group.
--

commit a91812825621124e6ae3a091adc6205cbeefed80
Author: ZhaoJinming <[email protected]>

accel/rocket: Fix error path handling in rocket_job_run()

This commit fixes multiple issues in the error paths of rocket_job_run(). It
resolves a dma_fence reference leak, correctly balances the runtime PM
usage counter on failure by switching to pm_runtime_resume_and_get(), and
returns ERR_PTR(ret) instead of an unsignaled fence to avoid DRM scheduler
warnings.

> diff --git a/drivers/accel/rocket/rocket_job.c 
> b/drivers/accel/rocket/rocket_job.c
> index 2f1861f960ccb..cf9f61682dd8b 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct 
> drm_sched_job *sched_job)

[ ... ]

>       ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> -     if (ret < 0)
> -             return fence;
> +     if (ret < 0) {
> +             pm_runtime_put(core->dev);
> +             dma_fence_put(job->done_fence);
> +             job->done_fence = NULL;
> +             dma_fence_put(fence);
> +             return ERR_PTR(ret);
> +     }

[Severity: High]
This is a pre-existing issue, but can core->iommu_group be NULL here?

If the device does not have an IOMMU group (e.g., IOMMU is disabled or not
present in the device tree), iommu_group_get() returns NULL. Calling
iommu_attach_group() unconditionally with a NULL group parameter will cause a
kernel crash when it attempts to dereference the pointer.

Should there be a check for a valid iommu_group before attaching?

[ ... ]

I also noticed a couple of pre-existing issues in this driver:

[Severity: High]
This isn't a bug introduced by this patch, but is it safe to call
drm_sched_job_cleanup() after drm_sched_job_arm() has already been called?

In drivers/accel/rocket/rocket_job.c:rocket_job_push(), if
rocket_acquire_object_fences() fails:

        drm_sched_job_arm(&job->base);
        ...
        ret = rocket_acquire_object_fences(job->in_bos, job->in_bo_count,
                                           &job->base, false);
        if (ret)
                goto err_unlock;

The error path returns to rocket_ioctl_submit_job(), which then calls:

out_cleanup_job:
        if (ret)
                drm_sched_job_cleanup(&rjob->base);

The DRM scheduler API contract states that drm_sched_job_arm() is a point of
no return that initializes fences and sequence numbers. Calling
drm_sched_job_cleanup() after this breaks the dma_fence sequence numbers,
potentially causing any context waiting on the aborted sequence number to hang
indefinitely.

[Severity: High]
This is a pre-existing issue, but does the runtime suspend callback need to
synchronize the shared IRQ before powering down?

In drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend():

        clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks),
                                   rdev->cores[core].clks);

The Rocket driver registers its IRQ handler with IRQF_SHARED. According to
the power management subsystem rules, drivers using shared IRQs must call
synchronize_irq() in their runtime suspend callback before disabling clocks.

Without this, could the hardware be powered off while the IRQ handler is
mid-flight and reading registers, potentially leading to system hangs or
spurious wakeups?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to