(Cc: Alex, John, Joel, Alistair, nouveau)
On Thu Sep 4, 2025 at 11:37 AM CEST, Danilo Krummrich wrote:
> On Thu Sep 4, 2025 at 12:11 AM CEST, Zhi Wang wrote:
>> diff --git a/drivers/vfio/pci/nvidia-vgpu/include/nvrm/gsp.h
>> b/drivers/vfio/pci/nvidia-vgpu/include/nvrm/gsp.h
>> new file mode 100644
>> index 000000000000..c3fb7b299533
>> --- /dev/null
>> +++ b/drivers/vfio/pci/nvidia-vgpu/include/nvrm/gsp.h
>> @@ -0,0 +1,18 @@
>> +/* SPDX-License-Identifier: MIT */
>> +#ifndef __NVRM_GSP_H__
>> +#define __NVRM_GSP_H__
>> +
>> +#include <nvrm/nvtypes.h>
>> +
>> +/* Excerpt of RM headers from
>> https://github.com/NVIDIA/open-gpu-kernel-modules/tree/570 */
>> +
>> +#define NV2080_CTRL_CMD_GSP_GET_FEATURES (0x20803601)
>> +
>> +typedef struct NV2080_CTRL_GSP_GET_FEATURES_PARAMS {
>> + NvU32 gspFeatures;
>> + NvBool bValid;
>> + NvBool bDefaultGspRmGpu;
>> + NvU8 firmwareVersion[GSP_MAX_BUILD_VERSION_LENGTH];
>> +} NV2080_CTRL_GSP_GET_FEATURES_PARAMS;
>> +
>> +#endif
>
> <snip>
>
>> +static struct version supported_version_list[] = {
>> + { 18, 1, "570.144" },
>> +};
>
> nova-core won't provide any firmware specific APIs, it is meant to serve as a
> hardware and firmware abstraction layer for higher level drivers, such as vGPU
> or nova-drm.
>
> As a general rule the interface between nova-core and higher level drivers
> must
> not leak any hardware or firmware specific details, but work on a higher level
> abstraction layer.
>
> Now, I recognize that at some point it might be necessary to do some kind of
> versioning in this API anyways. For instance, when the semantics of the
> firmware
> API changes too significantly.
>
> However, this would be a separte API where nova-core, at the initial
> handshake,
> then asks clients to use e.g. v2 of the nova-core API, still hiding any
> firmware
> and hardware details from the client.
>
> Some more general notes, since I also had a look at the nova-core <-> vGPU
> interface patches in your tree (even though I'm aware that they're not part of
> the RFC of course):
>
> The interface for the general lifecycle management for any clients attaching
> to
> nova-core (VGPU, nova-drm) should be common and not specific to vGPU. (The
> same
> goes for interfaces that will be used by vGPU and nova-drm.)
>
> The interface nova-core provides for that should be designed in Rust, so we
> can
> take advantage of all the features the type system provides us with connecting
> to Rust clients (nova-drm).
>
> For vGPU, we can then monomorphize those types into the corresponding C
> structures and provide the corresponding functions very easily.
>
> Doing it the other way around would be a very bad idea, since the Rust type
> system is much more powerful and hence it'd be very hard to avoid introducing
> limitations on the Rust side of things.
>
> Hence, I recommend to start with some patches defining the API in nova-core
> for
> the general lifecycle (in Rust), so we can take it from there.
>
> Another note: I don't see any use of the auxiliary bus in vGPU, any clients
> should attach via the auxiliary bus API, it provides proper matching where
> there's more than on compatible GPU in the system. nova-core already registers
> an auxiliary device for each bound PCI device.
>
> Please don't re-implement what the auxiliary bus already does for us.
>
> - Danilo