Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
to consider:

Pre-existing issues:
- [High] Silent failure and kernel panic when GSP RPC payload length exceeds 
max size
--

commit 3ec8c0b86229f7768cd026a5fe2006cc70c81806
Author: Hongling Zeng <[email protected]>

nouveau/gsp/rm/alloc: Cleanup IS_ERR_OR_NULL usage

r535_gsp_rpc_rm_free() calls nvkm_gsp_rpc_get() which never returns
NULL, only valid pointers or error pointers. Clean up by using IS_ERR()
instead, matching the actual return value contract.

> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c
> index 46e3a29f2ad7f..c9f86c0e9b255 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c
> @@ -35,7 +35,7 @@ r535_gsp_rpc_rm_free(struct nvkm_gsp_object *object)
>                  client->object.handle, object->handle);
>  
>       rpc = nvkm_gsp_rpc_get(gsp, NV_VGPU_MSG_FUNCTION_FREE, sizeof(*rpc));
> -     if (WARN_ON(IS_ERR_OR_NULL(rpc)))
> +     if (WARN_ON(IS_ERR(rpc)))
>               return -EIO;

[Severity: High]
This isn't a bug introduced by this patch, but while reviewing this cleanup
of IS_ERR_OR_NULL(), I noticed a pre-existing issue where NULL pointers can
be returned and dereferenced in the GSP RPC receive paths.

If the GSP firmware returns an RPC message with an invalid length,
r535_gsp_msgq_recv() detects this but returns NULL instead of an error
pointer:

drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c:r535_gsp_msgq_recv() {
        ...
        if (WARN_ON(rpc->length > max_rpc_size))
                return NULL;
        ...
}

This NULL propagates through functions like r535_gsp_rpc_rd(). Because callers
like r535_gsp_get_static_info() only check IS_ERR(), they will attempt to
dereference the NULL pointer:

drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c:r535_gsp_get_static_info()
 {
        ...
        rpc = nvkm_gsp_rpc_rd(gsp, NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO, 
sizeof(*rpc));
        if (IS_ERR(rpc))
                return PTR_ERR(rpc);

        gsp->internal.client.object.client = &gsp->internal.client;
        gsp->internal.client.object.parent = NULL;
        gsp->internal.client.object.handle = rpc->hInternalClient;
        ...
}

Could this NULL return also cause silent state corruption elsewhere? In
r535_gsp_rpc_rm_alloc_push(), a NULL return from nvkm_gsp_rpc_push() is
treated as a success condition:

drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/alloc.c:r535_gsp_rpc_rm_alloc_push()
 {
        ...
        rpc = nvkm_gsp_rpc_push(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV, 
sizeof(*rpc));
        if (IS_ERR_OR_NULL(rpc))
                return rpc;
        ...
}

This means the length error could be silently converted into a false success,
leaving the object initialized in software but unallocated in hardware.

Should r535_gsp_msgq_recv() be updated to return an error pointer like
ERR_PTR(-EIO) instead of NULL to prevent these panics and false successes?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=3

Reply via email to