Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
Pre-existing issues: - [Critical] The driver blindly trusts the `length` field of GSP messages and continuation records, leading to multiple integer underflows and massive out-of-bounds memory copies. - [Critical] Missing bounds check on GSP continuation message length allows a heap buffer overflow. - [High] Returning `NULL` on oversized GSP messages leads to infinite loops in the message queue worker and a false success bubbling up to callers, causing subsequent NULL pointer dereferences. - [High] A continuation message wrapping around the message queue ring buffer causes memory corruption and an overrun due to incorrect pointer math. - [High] Missing read memory barrier (rmb or dma_rmb) after reading the ring buffer write pointer. -- commit b655637a9811533563a43c05f87d130fe957a519 Author: Hongling Zeng <[email protected]> Subject: nouveau/gsp/rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c This commit cleans up IS_ERR_OR_NULL() usage for functions that supposedly never return NULL. It changes the checks after r535_gsp_msgq_peek(), r535_gsp_msgq_recv_one_elem(), and r535_gsp_msgq_recv() to use IS_ERR() instead. > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > index b4b2b5c00ab4e..5b89c8835a2e4 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > @@ -350,7 +350,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, > int *retries) > u32 size; > > rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), info.retries); > - if (IS_ERR_OR_NULL(rpc)) { > + if (IS_ERR(rpc)) { [Severity: High] This is a pre-existing issue, but does the polling loop in r535_gsp_msgq_wait() need a read memory barrier? r535_gsp_msgq_peek() calls r535_gsp_msgq_wait(), which polls the GSP write pointer *gsp->msgq.wptr from DMA coherent memory: while (rptr == wptr || ...) { ... wptr = *gsp->msgq.wptr; } Without an rmb() or dma_rmb() before returning, could weakly ordered CPUs speculatively fetch the payload data before the write pointer read is ordered, resulting in stale data being read from the ring buffer? > kvfree(buf); > return rpc; > } > @@ -359,7 +359,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, > int *retries) > info.continuation = true; > > rpc = r535_gsp_msgq_recv_one_elem(gsp, &info); > - if (IS_ERR_OR_NULL(rpc)) { > + if (IS_ERR(rpc)) { [Severity: Critical] This isn't a bug introduced by this patch, but does the firmware's message length need to be validated to prevent integer underflows? If rpc->length is smaller than sizeof(struct nvfw_gsp_rpc), does this underflow the length calculations and cause an out-of-bounds memcpy()? In r535_gsp_msgq_recv_one_elem(), the payload length calculation uses: if (info->continuation) memcpy(buf, mqe->data + sizeof(struct nvfw_gsp_rpc), len - sizeof(struct nvfw_gsp_rpc)); Additionally, in this loop in r535_gsp_msgq_recv(), the offset underflows: size = info.gsp_rpc_len - sizeof(*rpc); expected -= size; info.gsp_rpc_buf += size; There is a similar underflow in r535_gsp_msg_dump(): msg->length - sizeof(*msg) Could a malicious or buggy firmware send a small length and cause these calculations to wrap around, leading to massive memory corruption? [Severity: Critical] This is a pre-existing issue, but does this code check if the continuation message actually fits within the remaining allocated buffer? The buffer was allocated based on the initial expected message length. If a continuation chunk payload (info.gsp_rpc_len - sizeof(*rpc)) is larger than the remaining capacity (expected), does this overwrite the bounds of the kvmalloc'd buffer? [Severity: High] This isn't a bug introduced by this patch, but is there a miscalculation when a continuation message wraps around the ring buffer? In r535_gsp_msgq_recv_one_elem(), the code copies a wrapped continuation payload in two parts. The first part writes len - sizeof(struct nvfw_gsp_rpc) bytes. However, for the second part it copies: if (expected) { mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + 0 * 0x1000); memcpy(buf + len, mqe, expected); } Should the destination pointer be buf + len - sizeof(struct nvfw_gsp_rpc) instead of buf + len? Does using buf + len leave a gap of uninitialized memory and subsequently overrun the allocated buffer by sizeof(struct nvfw_gsp_rpc) bytes? > kvfree(buf); > return rpc; > } > @@ -499,7 +499,7 @@ r535_gsp_msg_recv(struct nvkm_gsp *gsp, int fn, u32 > gsp_rpc_len) > > retry: > rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), &retries); > - if (IS_ERR_OR_NULL(rpc)) > + if (IS_ERR(rpc)) > return rpc; > > rpc = r535_gsp_msgq_recv(gsp, gsp_rpc_len, &retries); [Severity: High] This is a pre-existing issue, but is it safe for r535_gsp_msgq_recv() to return NULL when a message is oversized? r535_gsp_msgq_recv() has the following check: if (WARN_ON(rpc->length > max_rpc_size)) return NULL; If an oversized message is received, this returns NULL instead of advancing the message queue read pointer and returning an error pointer like ERR_PTR(-EIO). Does this cause the message queue worker to get stuck in an infinite loop repeatedly processing the same invalid message? Also, could this NULL bubble up as a false success to callers that check for errors via IS_ERR(), causing subsequent NULL pointer dereferences? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=2
