rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c

sashiko-bot Sun, 21 Jun 2026 21:08:30 -0700

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) 
to consider:


Pre-existing issues:
- [High] Missing validation of GSP RPC message lengths from firmware can lead 
to integer underflow and massive kernel memory corruption.
- [High] Silent failure when rpc is NULL results in caller NULL pointer 
dereference.
- [High] Buffer overflow and uninitialized memory gap when a continuation GSP 
RPC chunk wraps around the message queue ring buffer.
- [High] Silent infinite stall of the GSP message queue when an oversized 
message is encountered.
- [High] Missing read memory barrier (`rmb()`) between reading the shared 
memory write pointer and reading the message payload.
- [High] Time-of-Check to Time-of-Use race condition on `rpc->length` bypasses 
bounds checks.
- [High] Trusting the CPU read pointer (`rptr`) from shared memory leads to 
arbitrary memory access.
--

commit 45236a2b4542b6f23a9a07cab6b6a3cd3588738d
Author: Hongling Zeng <[email protected]>

nouveau/gsp/rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c

Clean up incorrect IS_ERR_OR_NULL() usage for functions that never
return NULL:
- r535_gsp_msgq_peek() never returns NULL
- r535_gsp_msgq_recv_one_elem() never returns NULL

These functions should be checked with IS_ERR() instead.

> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> index 2ca6de4094e63..0a00c7d831c91 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> @@ -350,7 +350,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, 
> int *retries)

[Severity: High]
This is a pre-existing issue, but earlier in r535_gsp_msgq_recv(),
returning NULL when an oversized message is encountered seems to stall the
queue:

r535_gsp_msgq_recv() {
    ...
    if (WARN_ON(rpc->length > max_rpc_size))
        return NULL;
    ...
}

Does returning NULL here without advancing the queue read pointer cause an
unrecoverable infinite stall, since subsequent reads would encounter the
exact same oversized message?

[Severity: High]
This isn't a bug introduced by this patch, but there also appears to be a
Time-of-Check to Time-of-Use race condition on rpc->length:

r535_gsp_msgq_recv() {
    ...
    if (WARN_ON(rpc->length > max_rpc_size))
        return NULL;

    buf = kvmalloc(max_t(u32, rpc->length, expected), GFP_KERNEL);
    ...
}

Since rpc points directly to DMA coherent memory shared with the GSP, could
the length be overwritten in shared memory between the check and its use in
kvmalloc() or info.gsp_rpc_len, bypassing the validation?

>               u32 size;
>  
>               rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), info.retries);
> -             if (IS_ERR_OR_NULL(rpc)) {
> +             if (IS_ERR(rpc)) {
>                       kvfree(buf);
>                       return rpc;
>               }
> @@ -359,7 +359,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, 
> int *retries)
>               info.continuation = true;
>  
>               rpc = r535_gsp_msgq_recv_one_elem(gsp, &info);

[Severity: High]
This is a pre-existing issue, but in r535_gsp_msgq_recv_one_elem() called
here, is there a potential integer underflow?

r535_gsp_msgq_recv_one_elem() {
    ...
    if (info->continuation)
        memcpy(buf, mqe->data + sizeof(struct nvfw_gsp_rpc),
               len - sizeof(struct nvfw_gsp_rpc));
    ...
}

If the continuation message length read directly from shared memory is less
than sizeof(struct nvfw_gsp_rpc) (32 bytes), does this calculation underflow
and lead to a massive out-of-bounds read and write?

[Severity: High]
This is another pre-existing issue in r535_gsp_msgq_recv_one_elem(). Could
a continuation chunk wrapping around the ring buffer cause an out-of-bounds
write?

r535_gsp_msgq_recv_one_elem() {
    ...
    if (expected) {
        mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + 0 * 0x1000);
        memcpy(buf + len, mqe, expected);
    }
    ...
}

If the first copy only advanced the buffer logically by len - 32 due to
skipping the header, would writing the second part at buf + len leave a
32-byte gap of uninitialized memory and overflow the end of the allocated
buffer by 32 bytes?

[Severity: High]
This isn't a bug introduced by this patch, but does the underlying
r535_gsp_msgq_wait() function lack a required memory barrier?

r535_gsp_msgq_wait() {
    ...
    u32 wptr = *gsp->msgq.wptr;
    ...
}

When polling the shared memory write pointer, does the host CPU need an rmb()
after observing the updated write pointer?

Without it, could the CPU speculatively fetch stale payload data before the
pointer update is fully observed?

[Severity: High]
This is a pre-existing issue, but does r535_gsp_msgq_get_entry() safely
handle the CPU read pointer?

r535_gsp_msgq_get_entry() {
    u32 rptr = *gsp->msgq.rptr;
    return (void *)((u8 *)gsp->shm.msgq.ptr + GSP_PAGE_SIZE +
                    rptr * GSP_PAGE_SIZE);
}

Since gsp->msgq.rptr points to DMA coherent memory, could a compromised GSP
modify this pointer to an arbitrarily large value and force the host to read
out-of-bounds host memory?

> -             if (IS_ERR_OR_NULL(rpc)) {
> +             if (IS_ERR(rpc)) {
>                       kvfree(buf);
>                       return rpc;
>               }

[ ... ]

> @@ -499,7 +499,7 @@ r535_gsp_msg_recv(struct nvkm_gsp *gsp, int fn, u32 
> gsp_rpc_len)
>  
>  retry:
>       rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), &retries);
> -     if (IS_ERR_OR_NULL(rpc))
> +     if (IS_ERR(rpc))
>               return rpc;
>  
>       rpc = r535_gsp_msgq_recv(gsp, gsp_rpc_len, &retries);

[Severity: High]
This isn't a bug introduced by this patch, but when r535_gsp_msgq_recv()
returns NULL (for example, when encountering an oversized message), does
this result in a silent failure upstream?

The NULL propagates out to r535_gsp_rpc_rm_ctrl_push(), which handles it
like this:

r535_gsp_rpc_rm_ctrl_push() {
    ...
    if (IS_ERR_OR_NULL(rpc)) {
        *params = NULL;
        return PTR_ERR(rpc);
    }
    ...
}

Since PTR_ERR(NULL) is 0 (success), does this mask the failure and cause
upstream callers to dereference a NULL pointer thinking the call succeeded?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=2

Re: [PATCH v4 2/5] nouveau/gsp/rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c

Reply via email to