Re: [PATCH v1] misc: fastrpc: Add cache maintenance for non-coherent platforms

sashiko-bot Thu, 04 Jun 2026 13:02:51 -0700

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) 
to consider:


New issues:
- [High] `fastrpc_inv_args()` incorrectly passes `DMA_TO_DEVICE` to 
`dma_buf_end_cpu_access()` after beginning access with `DMA_FROM_DEVICE`.
- [High] Missing cache maintenance for output buffers before DSP execution.
- [High] The return value of `dma_buf_begin_cpu_access()` is ignored.
- [High] Misuse of `dma_buf_begin_cpu_access()` to perform cache maintenance 
for the DSP.
- [High] Cache maintenance is incorrectly skipped for shared IN/OUT buffers due 
to flawed overlap logic.
- [Medium] The overlap check in `fastrpc_inv_args()` incorrectly compares a 
kernel virtual address with a user-space virtual address, making it impossible 
to match.

Pre-existing issues:
- [Critical] `fastrpc_context_free()` leaks memory for DMA maps associated with 
handle arguments.
- [Critical] Untrusted pointers read from a DSP-shared DMA buffer are used 
directly as source pointers in `copy_to_user`, allowing arbitrary kernel memory 
leak.
--

commit 3ae5f11901e70743ac3aeecf63b6007d21fbb3e3
Author: Abhinav Parihar <[email protected]>

misc: fastrpc: Add cache maintenance for non-coherent platforms

This patch introduces helper functions to flush input buffers and invalidate
output buffers for fastrpc argument buffers on non-coherent platforms. It uses
the dma_buf_begin_cpu_access() and dma_buf_end_cpu_access() APIs to perform
the cache operations.

> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> index 1080f9acf70a3..043b6a5548fb5 100644
> --- a/drivers/misc/fastrpc.c
> +++ b/drivers/misc/fastrpc.c
[ ... ]
> +static void fastrpc_flush_args(struct fastrpc_invoke_ctx *ctx)
> +{
> +     union fastrpc_remote_arg *rpra = ctx->rpra;
> +     int i, inbufs, outbufs;
> +
> +     inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
> +     outbufs = REMOTE_SCALARS_OUTBUFS(ctx->sc);
> +
> +     for (i = 0; i < inbufs + outbufs; ++i) {
> +             int raix = ctx->olaps[i].raix;
> +             struct fastrpc_map *map = ctx->maps[raix];
> +
> +             if (raix + 1 > inbufs)
> +                     continue;

[Severity: High]
Are output buffers missing cache maintenance before DSP execution?

Before a device performs a DMA write to an output buffer, the CPU cache must
be invalidated (or cleaned and invalidated) to prevent dirty cache lines from
being asynchronously evicted and overwriting the device's DMA writes in RAM.

Here, output buffers are explicitly skipped, leaving them vulnerable to eviction
corruption during DSP execution.

> +             if (!map || !map->buf)
> +                     continue;
> +
> +             if (rpra[raix].buf.len && ctx->olaps[i].mstart) {

[Severity: High]
Does this overlap logic cause skipped cache maintenance for shared IN/OUT
buffers?

When an IN/OUT buffer is passed as both an inbuf and an outbuf pointing to the
same memory, fastrpc_get_buff_overlaps() sorts them and assigns mstart = 0 to
the completely overlapping second element.

Because sort() is not stable, the OUT buffer could be sorted first and the IN
buffer second, resulting in mstart = 0 for the IN buffer. Since OUT buffers are
already skipped above, skipping the IN buffer here when mstart = 0 causes the
flush to be completely bypassed.

> +                     dma_buf_begin_cpu_access(map->buf, DMA_TO_DEVICE);

[Severity: High]
Could dma_buf_begin_cpu_access() fail here and bypass fence waits?

dma_buf_begin_cpu_access() can return an error code like -ERESTARTSYS or
-ENOMEM. Ignoring the return value silently bypasses fence waits and cache
synchronization.

[Severity: High]
Is the CPU access API being misused to proxy DSP cache maintenance?

Calling dma_buf_begin_cpu_access() and dma_buf_end_cpu_access() back-to-back
to trigger device cache maintenance within the exporter appears to bypass the
native DMA API. This broadcasts the sync operation to all attached devices and
should likely be handled natively via dma_sync_sg_for_device().

> +                     dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE);
> +             }
> +     }
> +}
> +
> +static void fastrpc_inv_args(struct fastrpc_invoke_ctx *ctx)
> +{
> +     union fastrpc_remote_arg *rpra = ctx->rpra;
> +     int i, inbufs, outbufs;
> +
> +     inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
> +     outbufs = REMOTE_SCALARS_OUTBUFS(ctx->sc);
> +
> +     for (i = 0; i < inbufs + outbufs; ++i) {
> +             int raix = ctx->olaps[i].raix;
> +             struct fastrpc_map *map = ctx->maps[raix];
> +
> +             if (raix + 1 <= inbufs)
> +                     continue;
> +             if (!rpra[raix].buf.len)
> +                     continue;
> +             if (!map || !map->buf)
> +                     continue;
> +
> +             /*
> +              * Skip invalidation if the argument overlaps with the
> +              * RPC control header page.
> +              */
> +             if (((uintptr_t)rpra & PAGE_MASK) ==
> +                     ((uintptr_t)rpra[raix].buf.pv & PAGE_MASK))

[Severity: Medium]
Does this overlap check incorrectly compare a kernel virtual address with a
user-space virtual address?

The rpra array is a kernel virtual address, but rpra[raix].buf.pv contains a
user-space virtual address for mapped buffers (ctx->args[i].ptr). Comparing
their page masks will almost never match, rendering this check essentially
dead code.

> +                     continue;
> +
> +             if (ctx->olaps[i].mstart) {

[Severity: High]
Similar to the issue in fastrpc_flush_args(), could the OUT buffer get
mstart = 0 due to the unstable sort in fastrpc_get_buff_overlaps()?

If the IN buffer is sorted first, the OUT buffer gets mstart = 0 and is
skipped, bypassing the invalidation.

> +                     dma_buf_begin_cpu_access(map->buf, DMA_FROM_DEVICE);

[Severity: High]
Is the return value of dma_buf_begin_cpu_access() ignored here as well?

This can lead to severe race conditions or silent data corruption if the call
fails and implicit fences are not waited on.

> +                     dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE);

[Severity: High]
Should this be DMA_FROM_DEVICE instead of DMA_TO_DEVICE?

The DMA-BUF API requires the direction passed to the end access function to
match the begin function. Passing DMA_TO_DEVICE after DMA_FROM_DEVICE causes a
cache clean instead of completing the cache invalidation. This can overwrite
the fresh DSP output data in RAM with stale CPU cache lines.

> +             }
> +     }
> +}

[ ... ]

>  static void fastrpc_context_free(struct kref *ref)
>  {
>       struct fastrpc_invoke_ctx *ctx;
>       struct fastrpc_channel_ctx *cctx;
>       unsigned long flags;
>       int i;
>  
>       ctx = container_of(ref, struct fastrpc_invoke_ctx, refcount);
>       cctx = ctx->cctx;
>  
>       for (i = 0; i < ctx->nbufs; i++)
>               fastrpc_map_put(ctx->maps[i]);

[Severity: Critical]
This is a pre-existing issue, but does fastrpc_context_free() leak memory for
DMA maps associated with handle arguments?

fastrpc_create_maps() allocates and initializes up to ctx->nscalars map
objects, which includes handles (indices >= ctx->nbufs). However, this loop
only cleans up from 0 to ctx->nbufs - 1. The maps corresponding to handles are
never released, permanently leaking map objects and pinning DMA buffers.

[ ... ]

>  static int fastrpc_put_args(struct fastrpc_invoke_ctx *ctx,
>                           u32 kernel)
>  {
>       union fastrpc_remote_arg *rpra = ctx->rpra;
[ ... ]
>       for (i = inbufs; i < ctx->nbufs; ++i) {
>               if (!ctx->maps[i]) {
>                       void *src = (void *)(uintptr_t)rpra[i].buf.pv;
>                       void *dst = (void *)(uintptr_t)ctx->args[i].ptr;
>                       u64 len = rpra[i].buf.len;
>  
>                       if (!kernel) {
>                               if (copy_to_user((void __user *)dst, src, len)) 
> {

[Severity: Critical]
This is a pre-existing issue, but does this pass untrusted pointers read from
a DSP-shared DMA buffer directly as source pointers to copy_to_user()?

The rpra array is mapped bidirectionally to the DSP via the shared ctx->buf.
A compromised or malicious DSP could overwrite rpra[i].buf.pv and
rpra[i].buf.len with arbitrary kernel addresses.

Since this code blindly passes this attacker-controlled address as the src to
copy_to_user(), can the DSP leak arbitrary kernel memory to user space?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=1

Re: [PATCH v1] misc: fastrpc: Add cache maintenance for non-coherent platforms

Reply via email to