Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues: - [High] Silent failure and kernel panic when GSP RPC payload length exceeds max size -- commit 3ec8c0b86229f7768cd026a5fe2006cc70c81806 Author: Hongling Zeng <[email protected]> nouveau/gsp/rm/alloc: Cleanup IS_ERR_OR_NULL usage r535_gsp_rpc_rm_free() calls nvkm_gsp_rpc_get() which never returns NULL, only valid pointers or error pointers. Clean up by using IS_ERR() instead, matching the actual return value contract. > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c > b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c > index 46e3a29f2ad7f..c9f86c0e9b255 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c > @@ -35,7 +35,7 @@ r535_gsp_rpc_rm_free(struct nvkm_gsp_object *object) > client->object.handle, object->handle); > > rpc = nvkm_gsp_rpc_get(gsp, NV_VGPU_MSG_FUNCTION_FREE, sizeof(*rpc)); > - if (WARN_ON(IS_ERR_OR_NULL(rpc))) > + if (WARN_ON(IS_ERR(rpc))) > return -EIO; [Severity: High] This isn't a bug introduced by this patch, but while reviewing this cleanup of IS_ERR_OR_NULL(), I noticed a pre-existing issue where NULL pointers can be returned and dereferenced in the GSP RPC receive paths. If the GSP firmware returns an RPC message with an invalid length, r535_gsp_msgq_recv() detects this but returns NULL instead of an error pointer: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c:r535_gsp_msgq_recv() { ... if (WARN_ON(rpc->length > max_rpc_size)) return NULL; ... } This NULL propagates through functions like r535_gsp_rpc_rd(). Because callers like r535_gsp_get_static_info() only check IS_ERR(), they will attempt to dereference the NULL pointer: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c:r535_gsp_get_static_info() { ... rpc = nvkm_gsp_rpc_rd(gsp, NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO, sizeof(*rpc)); if (IS_ERR(rpc)) return PTR_ERR(rpc); gsp->internal.client.object.client = &gsp->internal.client; gsp->internal.client.object.parent = NULL; gsp->internal.client.object.handle = rpc->hInternalClient; ... } Could this NULL return also cause silent state corruption elsewhere? In r535_gsp_rpc_rm_alloc_push(), a NULL return from nvkm_gsp_rpc_push() is treated as a success condition: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c:r535_gsp_rpc_rm_alloc_push() { ... rpc = nvkm_gsp_rpc_push(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV, sizeof(*rpc)); if (IS_ERR_OR_NULL(rpc)) return rpc; ... } This means the length error could be silently converted into a false success, leaving the object initialized in software but unallocated in hardware. Should r535_gsp_msgq_recv() be updated to return an error pointer like ERR_PTR(-EIO) instead of NULL to prevent these panics and false successes? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=3
