Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
New issues: - [High] `fastrpc_inv_args()` incorrectly passes `DMA_TO_DEVICE` to `dma_buf_end_cpu_access()` after beginning access with `DMA_FROM_DEVICE`. - [High] Missing cache maintenance for output buffers before DSP execution. - [High] The return value of `dma_buf_begin_cpu_access()` is ignored. - [High] Misuse of `dma_buf_begin_cpu_access()` to perform cache maintenance for the DSP. - [High] Cache maintenance is incorrectly skipped for shared IN/OUT buffers due to flawed overlap logic. - [Medium] The overlap check in `fastrpc_inv_args()` incorrectly compares a kernel virtual address with a user-space virtual address, making it impossible to match. Pre-existing issues: - [Critical] `fastrpc_context_free()` leaks memory for DMA maps associated with handle arguments. - [Critical] Untrusted pointers read from a DSP-shared DMA buffer are used directly as source pointers in `copy_to_user`, allowing arbitrary kernel memory leak. -- commit 3ae5f11901e70743ac3aeecf63b6007d21fbb3e3 Author: Abhinav Parihar <[email protected]> misc: fastrpc: Add cache maintenance for non-coherent platforms This patch introduces helper functions to flush input buffers and invalidate output buffers for fastrpc argument buffers on non-coherent platforms. It uses the dma_buf_begin_cpu_access() and dma_buf_end_cpu_access() APIs to perform the cache operations. > diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c > index 1080f9acf70a3..043b6a5548fb5 100644 > --- a/drivers/misc/fastrpc.c > +++ b/drivers/misc/fastrpc.c [ ... ] > +static void fastrpc_flush_args(struct fastrpc_invoke_ctx *ctx) > +{ > + union fastrpc_remote_arg *rpra = ctx->rpra; > + int i, inbufs, outbufs; > + > + inbufs = REMOTE_SCALARS_INBUFS(ctx->sc); > + outbufs = REMOTE_SCALARS_OUTBUFS(ctx->sc); > + > + for (i = 0; i < inbufs + outbufs; ++i) { > + int raix = ctx->olaps[i].raix; > + struct fastrpc_map *map = ctx->maps[raix]; > + > + if (raix + 1 > inbufs) > + continue; [Severity: High] Are output buffers missing cache maintenance before DSP execution? Before a device performs a DMA write to an output buffer, the CPU cache must be invalidated (or cleaned and invalidated) to prevent dirty cache lines from being asynchronously evicted and overwriting the device's DMA writes in RAM. Here, output buffers are explicitly skipped, leaving them vulnerable to eviction corruption during DSP execution. > + if (!map || !map->buf) > + continue; > + > + if (rpra[raix].buf.len && ctx->olaps[i].mstart) { [Severity: High] Does this overlap logic cause skipped cache maintenance for shared IN/OUT buffers? When an IN/OUT buffer is passed as both an inbuf and an outbuf pointing to the same memory, fastrpc_get_buff_overlaps() sorts them and assigns mstart = 0 to the completely overlapping second element. Because sort() is not stable, the OUT buffer could be sorted first and the IN buffer second, resulting in mstart = 0 for the IN buffer. Since OUT buffers are already skipped above, skipping the IN buffer here when mstart = 0 causes the flush to be completely bypassed. > + dma_buf_begin_cpu_access(map->buf, DMA_TO_DEVICE); [Severity: High] Could dma_buf_begin_cpu_access() fail here and bypass fence waits? dma_buf_begin_cpu_access() can return an error code like -ERESTARTSYS or -ENOMEM. Ignoring the return value silently bypasses fence waits and cache synchronization. [Severity: High] Is the CPU access API being misused to proxy DSP cache maintenance? Calling dma_buf_begin_cpu_access() and dma_buf_end_cpu_access() back-to-back to trigger device cache maintenance within the exporter appears to bypass the native DMA API. This broadcasts the sync operation to all attached devices and should likely be handled natively via dma_sync_sg_for_device(). > + dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE); > + } > + } > +} > + > +static void fastrpc_inv_args(struct fastrpc_invoke_ctx *ctx) > +{ > + union fastrpc_remote_arg *rpra = ctx->rpra; > + int i, inbufs, outbufs; > + > + inbufs = REMOTE_SCALARS_INBUFS(ctx->sc); > + outbufs = REMOTE_SCALARS_OUTBUFS(ctx->sc); > + > + for (i = 0; i < inbufs + outbufs; ++i) { > + int raix = ctx->olaps[i].raix; > + struct fastrpc_map *map = ctx->maps[raix]; > + > + if (raix + 1 <= inbufs) > + continue; > + if (!rpra[raix].buf.len) > + continue; > + if (!map || !map->buf) > + continue; > + > + /* > + * Skip invalidation if the argument overlaps with the > + * RPC control header page. > + */ > + if (((uintptr_t)rpra & PAGE_MASK) == > + ((uintptr_t)rpra[raix].buf.pv & PAGE_MASK)) [Severity: Medium] Does this overlap check incorrectly compare a kernel virtual address with a user-space virtual address? The rpra array is a kernel virtual address, but rpra[raix].buf.pv contains a user-space virtual address for mapped buffers (ctx->args[i].ptr). Comparing their page masks will almost never match, rendering this check essentially dead code. > + continue; > + > + if (ctx->olaps[i].mstart) { [Severity: High] Similar to the issue in fastrpc_flush_args(), could the OUT buffer get mstart = 0 due to the unstable sort in fastrpc_get_buff_overlaps()? If the IN buffer is sorted first, the OUT buffer gets mstart = 0 and is skipped, bypassing the invalidation. > + dma_buf_begin_cpu_access(map->buf, DMA_FROM_DEVICE); [Severity: High] Is the return value of dma_buf_begin_cpu_access() ignored here as well? This can lead to severe race conditions or silent data corruption if the call fails and implicit fences are not waited on. > + dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE); [Severity: High] Should this be DMA_FROM_DEVICE instead of DMA_TO_DEVICE? The DMA-BUF API requires the direction passed to the end access function to match the begin function. Passing DMA_TO_DEVICE after DMA_FROM_DEVICE causes a cache clean instead of completing the cache invalidation. This can overwrite the fresh DSP output data in RAM with stale CPU cache lines. > + } > + } > +} [ ... ] > static void fastrpc_context_free(struct kref *ref) > { > struct fastrpc_invoke_ctx *ctx; > struct fastrpc_channel_ctx *cctx; > unsigned long flags; > int i; > > ctx = container_of(ref, struct fastrpc_invoke_ctx, refcount); > cctx = ctx->cctx; > > for (i = 0; i < ctx->nbufs; i++) > fastrpc_map_put(ctx->maps[i]); [Severity: Critical] This is a pre-existing issue, but does fastrpc_context_free() leak memory for DMA maps associated with handle arguments? fastrpc_create_maps() allocates and initializes up to ctx->nscalars map objects, which includes handles (indices >= ctx->nbufs). However, this loop only cleans up from 0 to ctx->nbufs - 1. The maps corresponding to handles are never released, permanently leaking map objects and pinning DMA buffers. [ ... ] > static int fastrpc_put_args(struct fastrpc_invoke_ctx *ctx, > u32 kernel) > { > union fastrpc_remote_arg *rpra = ctx->rpra; [ ... ] > for (i = inbufs; i < ctx->nbufs; ++i) { > if (!ctx->maps[i]) { > void *src = (void *)(uintptr_t)rpra[i].buf.pv; > void *dst = (void *)(uintptr_t)ctx->args[i].ptr; > u64 len = rpra[i].buf.len; > > if (!kernel) { > if (copy_to_user((void __user *)dst, src, len)) > { [Severity: Critical] This is a pre-existing issue, but does this pass untrusted pointers read from a DSP-shared DMA buffer directly as source pointers to copy_to_user()? The rpra array is mapped bidirectionally to the DSP via the shared ctx->buf. A compromised or malicious DSP could overwrite rpra[i].buf.pv and rpra[i].buf.len with arbitrary kernel addresses. Since this code blindly passes this attacker-controlled address as the src to copy_to_user(), can the DSP leak arbitrary kernel memory to user space? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
