>Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
>to consider:



>



>New issues:



>- [High] Unhandled interrupt storm during device removal due to 
>`pm_runtime_get_if_active()`.
I try to fix it  in next version of patches.


>- [High] NULL pointer dereference in the IRQ thread during teardown 
>(Use-After-Free).



Regarding the NULL pointer dereference during teardown concern:

This cannot occur in the current patch. The threaded IRQ handler has three 
layers of protection:

pm_runtime_get_if_active() == 1 (line 423): In rocket_core_fini(), 
pm_runtime_disable() is called before core->iommu_group = NULL. After 
pm_runtime_disable(), disable_depth > 0, so pm_runtime_get_if_active() returns 
-EINVAL. Since -EINVAL != 1 is false, the threaded handler never calls 
rocket_job_handle_irq() and thus never accesses core->iommu_group. This holds 
even if the thread is woken by coalescing after pm_runtime_disable().
scoped_guard(mutex, &core->job_lock) with in_flight_job check (line 345): Even 
if the first check were somehow bypassed, rocket_job_handle_irq() only accesses 
core->iommu_group inside a scoped_guard that checks core->in_flight_job != 
NULL. By the time rocket_core_fini() runs, drm_sched_fini() has already been 
called (via rocket_job_fini()), which stops the scheduler and releases all 
jobs, leaving in_flight_job as NULL.
Execution ordering guarantee: rocket_core_fini() calls pm_runtime_disable() 
before setting iommu_group = NULL. The pm_runtime_get_if_active() check in the 
thread handler is evaluated under spin_lock_irqsave (see 
pm_runtime_get_conditional() at runtime.c:1233), which provides the 
happens-before guarantee with respect to disable_depth modification in 
__pm_runtime_disable().
No code path exists where the threaded handler can reach 
iommu_detach_group(NULL, core->iommu_group) after core->iommu_group has been 
set to NULL.




>Pre-existing issues:



>- [Critical] Double free of `rjob->tasks` array during job submission error 
>handling.



>- [High] NULL pointer dereference when freeing an uninitialized `job->domain`.



>--



>



>commit 7c59a4d5e2e74264bcfe7bc8aa7c903516b1c1c0



>Author: ZhaoJinming <[email protected]>



>



>accel/rocket: Fix iommu_group leak and unsafe IRQ register access



>



>This commit replaces iommu_group_get() with the cached core->iommu_group



>to prevent a reference leak. It also adds pm_runtime_get_if_active() checks



>in the IRQ handler and thread to prevent unsafe hardware register accesses



>when the device is suspended.



>



>> diff --git a/drivers/accel/rocket/rocket_job.c 
>> b/drivers/accel/rocket/rocket_job.c



>> index cf9f61682dd8b..21b433826b479 100644



>> --- a/drivers/accel/rocket/rocket_job.c



>> +++ b/drivers/accel/rocket/rocket_job.c



>> @@ -349,7 +349,7 @@ static void rocket_job_handle_irq(struct rocket_core 
>> *core)



>>?                             return;



>>?                     }



>>? 



>> -                    iommu_detach_group(NULL, iommu_group_get(core->dev));



>> +                    iommu_detach_group(NULL, core->iommu_group);



>



>[Severity: High]



>Could this lead to a NULL pointer dereference during device teardown?



>



>In rocket_core_fini(), core->iommu_group is set to NULL without acquiring



>core->job_lock or waiting for the IRQ thread to finish:



>



>rocket_core_fini()



>??? ...



>??? core->iommu_group = NULL;



>



>If the threaded IRQ handler runs concurrently after passing the runtime PM



>check, it will pass this newly nulled pointer to iommu_detach_group(),



>causing the IOMMU core to dereference a NULL pointer when acquiring the



>group's mutex.



>



>>?                     dma_fence_signal(core->in_flight_job->done_fence);



>>?                     pm_runtime_put_autosuspend(core->dev);



>>?                     core->in_flight_job = NULL;



>



>[ ... ]



>



>> @@ -428,16 +431,24 @@ static irqreturn_t rocket_job_irq_handler_thread(int 
>> irq, void *data)



>>? static irqreturn_t rocket_job_irq_handler(int irq, void *data)



>>? {



>>?     struct rocket_core *core = data;



>> -    u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);



>> +    u32 raw_status;



>> +



>> +    if (pm_runtime_get_if_active(core->dev) != 1)



>> +            return IRQ_NONE;



>



>[Severity: High]



>Will this cause an unhandled interrupt storm during module removal?



>



>During module removal, rocket_core_fini() disables runtime PM:



>



>rocket_core_fini()



>??? pm_runtime_disable(core->dev);



>



>This happens before the hardware is stopped. If a hardware job completes



>subsequently, the device raises a level-triggered interrupt. Since runtime



>PM is disabled, pm_runtime_get_if_active() returns -EINVAL.



>



>The handler then unconditionally returns IRQ_NONE without clearing the



>interrupt. Because the IRQ is level-triggered, the unacknowledged interrupt



>will immediately refire upon handler exit, potentially causing an infinite



>interrupt storm that deadlocks the CPU.



>



>[Severity: Critical]



>This isn't a bug introduced by this patch, but does the error handling in



>rocket_copy_tasks() cause a double free of the tasks array?



>



>If validation of the incoming tasks fails, the array is freed but the



>pointer is not cleared:



>



>rocket_copy_tasks() {



>??? ...



>fail:



>??? kvfree(rjob->tasks);



>??? return ret;



>}



>



>The caller then jumps to the error label and invokes rocket_job_put(),



>which triggers rocket_job_cleanup(). This cleanup function frees the



>dangling pointer again:



>



>rocket_job_cleanup() {



>??? ...



>??? kvfree(job->tasks);



>}



>



>Can a local unprivileged attacker exploit this via the submit ioctl to



>corrupt kernel memory?



>



>[Severity: High]



>This isn't a bug introduced by this patch, but is there a NULL pointer



>dereference when freeing an uninitialized domain?



>



>When the job structure is allocated, all fields are initialized to zero.



>If setup fails early (for example, in drm_sched_job_init() or



>rocket_copy_tasks()), job->domain remains NULL.



>



>The error handling path invokes rocket_job_cleanup(), which unconditionally



>releases the domain:



>



>rocket_job_cleanup() {



>??? ...



>??? rocket_iommu_domain_put(job->domain);



>}



>



>This function immediately dereferences the pointer:



>



>rocket_iommu_domain_put() {



>??? ...



>??? kref_put(&domain->kref, rocket_iommu_domain_destroy);



>}



>



>Could this cause a kernel panic if an unprivileged user passes invalid



>arguments?



>



>-- 



>Sashiko AI review ・ 
>https://sashiko.dev/#/patchset/[email protected]?part=2



>


Reply via email to