Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues: - [High] Incorrect client context handling and missing locks for `abo->client` can cause NULL pointer dereferences and cross-process handle lookup failures. - [Medium] Data race in `amdxdna_gem_vmap` due to double-checked locking without memory barriers. -- commit 92210e4afeba3f188842fc3d0b7a0121878a8420 Author: Shuvam Pandey <[email protected]> accel/amdxdna: Validate sync BO range before flushing amdxdna_drm_sync_bo_ioctl() takes offset and size from userspace and uses them to build the address and length passed to drm_clflush_virt_range() for BOs that have a kernel mapping. The values are not checked against the BO size, so an out-of-range request can pass an address outside the BO mapping to cache maintenance. > diff --git a/drivers/accel/amdxdna/amdxdna_gem.c > b/drivers/accel/amdxdna/amdxdna_gem.c > index 6e367ddb9e1be..c56c8fd86276b 100644 > --- a/drivers/accel/amdxdna/amdxdna_gem.c > +++ b/drivers/accel/amdxdna/amdxdna_gem.c > @@ -1040,20 +1040,33 @@ int amdxdna_drm_sync_bo_ioctl(struct drm_device *dev, > } > abo = to_xdna_obj(gobj); > > + if (args->offset > gobj->size || > + args->size > gobj->size - args->offset) { > + ret = -EINVAL; > + goto put_obj; > + } > + > ret = amdxdna_gem_pin(abo); > if (ret) { > XDNA_ERR(xdna, "Pin BO %d failed, ret %d", args->handle, ret); > goto put_obj; > } > > - if (is_import_bo(abo)) > - drm_clflush_sg(abo->base.sgt); > - else if (amdxdna_gem_vmap(abo)) > - drm_clflush_virt_range(amdxdna_gem_vmap(abo) + args->offset, > args->size); > - else if (abo->base.pages) > - drm_clflush_pages(abo->base.pages, gobj->size >> PAGE_SHIFT); > - else > - drm_WARN(&xdna->ddev, 1, "Can not get flush memory"); > + if (args->size) { > + if (is_import_bo(abo)) { > + drm_clflush_sg(abo->base.sgt); > + } else { > + void *kva = amdxdna_gem_vmap(abo); [Severity: Medium] This is a pre-existing issue, but does amdxdna_gem_vmap() have a data race on abo->mem.kva? Looking at amdxdna_gem_vmap(), it implements a double-checked locking pattern by checking for kva without synchronization before falling back to the lock: if (abo->mem.kva) return abo->mem.kva; /* The first call to get the kva, taking slow path. */ guard(mutex)(&abo->lock); When the mapping is created, it assigns the pointer using a plain write: abo->mem.kva = map.vaddr; This violates the Linux Kernel Memory Model and will trigger KCSAN. Without READ_ONCE()/WRITE_ONCE() or smp_load_acquire()/smp_store_release(), the compiler could potentially reorder the load/store and risk exposing an uninitialized pointer to concurrent readers. > + > + if (kva) > + drm_clflush_virt_range(kva + args->offset, > + args->size); > + else if (abo->base.pages) > + drm_clflush_pages(abo->base.pages, gobj->size > >> PAGE_SHIFT); > + else > + drm_WARN(&xdna->ddev, 1, "Can not get flush > memory"); > + } > + } > > amdxdna_gem_unpin(abo); > [Severity: High] This isn't a bug introduced by this patch, but is it safe to pass abo->client without holding abo->lock later in this function? At the end of amdxdna_drm_sync_bo_ioctl(), the code calls: ret = amdxdna_hwctx_sync_debug_bo(abo->client, args->handle); If a concurrent thread closes the GEM handle, it drops abo->open_ref and sets abo->client = NULL. The sync ioctl thread will then pass this NULL pointer to amdxdna_hwctx_sync_debug_bo(), which immediately dereferences it: drivers/accel/amdxdna/amdxdna_ctx.c:amdxdna_hwctx_sync_debug_bo() { struct amdxdna_dev *xdna = client->xdna; ... gobj = drm_gem_object_lookup(client->filp, debug_bo_hdl); ... } This can cause a kernel crash. A similar NULL dereference also exists in the error path of amdxdna_gem_vmap() when doing: XDNA_ERR(abo->client->xdna, "Vmap bo failed, ret %d", ret); Additionally, for shared BOs (like via PRIME), abo->client remains assigned to the original creator. If another process imports and syncs the BO, passing abo->client causes drm_gem_object_lookup() to look up the caller's handle ID in the original creator's handle space. This can lead to lookup failures or silently syncing an unrelated BO. Should this ioctl use the caller's filp->driver_priv as the client context instead of relying on the BO's creator? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
