On 6/5/26 11:18, [email protected] wrote:
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
to consider:

New issues:
- [High] Dropping dev_lock during amdxdna_remove() list iteration causes a 
Use-After-Free if a file is closed concurrently.
- [High] Deferring IOMMU SVA unbind to amdxdna_drm_close() causes a NULL 
pointer dereference in the IOMMU core.

Agreed. I will provide a simpler fix in v2 patch.

Lizhi


Pre-existing issues:
- [High] Deferred GEM object freeing accesses destroyed IOMMU domain 
structures, causing a Use-After-Free.
- [High] Deferred GEM object freeing calls flush_workqueue() on a destroyed 
workqueue pointer.
--

commit ec7bb08ec6d7a0837cd580d24a3f76e7ed06a6c6
Author: Lizhi Hou <[email protected]>

accel/amdxdna: Fix amdxdna_client lifetime race during device removal

This patch changes the client cleanup logic during device removal.
It decouples hardware teardown from client cleanup to avoid a use-after-free
if amdxdna_drm_close() is called concurrently with amdxdna_remove().

diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c 
b/drivers/accel/amdxdna/amdxdna_pci_drv.c
index c677293c1ae7f..3016db07c7cad 100644
--- a/drivers/accel/amdxdna/amdxdna_pci_drv.c
+++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c
@@ -167,18 +167,12 @@ static void amdxdna_drm_close(struct drm_device *ddev, 
struct drm_file *filp)
  {
        struct amdxdna_client *client = filp->driver_priv;
        struct amdxdna_dev *xdna = to_xdna_dev(ddev);
-       int idx;
XDNA_DBG(xdna, "closing pid %d", client->pid); - if (!drm_dev_enter(&xdna->ddev, &idx))
-               return;
-
        mutex_lock(&xdna->dev_lock);
        amdxdna_client_cleanup(client);
[Severity: High]
Since amdxdna_client_cleanup() is now deferred until file close, could this
trigger a NULL pointer dereference during SVA teardown?

When a device is removed, the PCI subsystem unlinks it from its IOMMU group.
Later, when this file is closed, amdxdna_sva_fini() calls
iommu_sva_unbind_device().

Inside the IOMMU core, iommu_detach_device_pasid() dereferences the group
pointer:

        struct iommu_group *group = dev->iommu_group;

        mutex_lock(&group->mutex);

If the device was already removed, dev->iommu_group would be NULL, causing a
crash. Does the SVA unbind need to happen earlier during device removal?

        mutex_unlock(&xdna->dev_lock);
-
-       drm_dev_exit(idx);
  }
[ ... ]
@@ -436,14 +430,8 @@ static void amdxdna_remove(struct pci_dev *pdev)
        amdxdna_sysfs_fini(xdna);
mutex_lock(&xdna->dev_lock);
-       client = list_first_entry_or_null(&xdna->client_list,
-                                         struct amdxdna_client, node);
-       while (client) {
-               amdxdna_client_cleanup(client);
-
-               client = list_first_entry_or_null(&xdna->client_list,
-                                                 struct amdxdna_client, node);
-       }
+       list_for_each_entry(client, &xdna->client_list, node)
[Severity: High]
Does this list_for_each_entry() loop have a use-after-free risk if the lock
is dropped internally?

The loop calls amdxdna_hwctx_remove_all(), which invokes aie2_hwctx_fini().
Inside aie2_hwctx_fini(), xdna->dev_lock is temporarily dropped to wait for
hardware idle:

        mutex_unlock(&xdna->dev_lock);

Since this patch removes the drm_dev_enter() safeguard from
amdxdna_drm_close(), a concurrent file close can acquire dev_lock while it
is dropped here. The concurrent close would remove the client from the list
via list_del() and free the client structure.

When the lock is re-acquired and list_for_each_entry() attempts to fetch the
next element, won't it dereference the freed client->node.next pointer?

+               amdxdna_hwctx_remove_all(client);
xdna->dev_info->ops->fini(xdna);
        mutex_unlock(&xdna->dev_lock);
[Severity: High]
This is a pre-existing issue, but does this function leave xdna->domain
dangling, leading to a use-after-free on deferred file close?

During device removal, amdxdna_remove() calls amdxdna_iommu_fini(), which
frees the domain but leaves the pointer intact:

        iommu_domain_free(xdna->domain);

If the application keeps the file descriptor open, drm_gem_object_put() will
later invoke amdxdna_gem_obj_free() during cleanup. This calls
amdxdna_dma_unmap_bo():

        if (amdxdna_iova_on(xdna)) {
                size = iova_align(&xdna->iovad, abo->mem.size);
                iommu_unmap(xdna->domain, abo->mem.dma_addr, size);

Since xdna->domain is not NULL, amdxdna_iova_on() returns true, and
iommu_unmap() is called on the freed domain. Should the domain pointer
be set to NULL after being freed?


[Severity: High]
This is a pre-existing issue, but could deferred GEM cleanup trigger a
use-after-free on the notifier workqueue?

Earlier in amdxdna_remove(), destroy_workqueue(xdna->notifier_wq) frees the
workqueue structure.

When a process retains an open file descriptor past device removal, the DRM
core defers GEM cleanup to the eventual file close. When
drm_gem_object_put() calls amdxdna_gem_obj_free(), it unconditionally does:

        flush_workqueue(xdna->notifier_wq);

Since the workqueue was already destroyed and freed, won't this result in
a use-after-free crash?

Reply via email to