On Mon, 1 Feb 2021 19:51:50 +0200 Yishai Hadas wrote: > Currently mlx5 PCI VF and SF are enabled by default for RoCE > functionality. > > Currently a user does not have the ability to disable RoCE for a PCI > VF/SF device before such device is enumerated by the driver. > > User is also incapable to do such setting from smartnic scenario for a > VF from the smartnic. > > Current 'enable_roce' device knob is limited to do setting only at > driverinit time. By this time device is already created and firmware has > already allocated necessary system memory for supporting RoCE. > > When a RoCE is disabled for the PCI VF/SF device, it saves 1 Mbyte of > system memory per function. Such saving is helpful when running on low > memory embedded platform with many VFs or SFs. > > Therefore, it is desired to empower user to disable RoCE functionality > before a PCI SF/VF device is enumerated.
You say that the user on the VF/SF side wants to save memory, yet the control knob is on the eswitch instance side, correct? > This is achieved by extending existing 'port function' object to control > capabilities of a function. This enables users to control capability of > the device before enumeration. > > Examples when user prefers to disable RoCE for a VF when using switchdev > mode: > > $ devlink port show pci/0000:06:00.0/1 > pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0 > pfnum 0 vfnum 0 external false splittable false > function: > hw_addr 00:00:00:00:00:00 roce on > > $ devlink port function set pci/0000:06:00.0/1 roce off > > $ devlink port show pci/0000:06:00.0/1 > pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0 > pfnum 0 vfnum 0 external false splittable false > function: > hw_addr 00:00:00:00:00:00 roce off > > FAQs: > ----- > 1. What does roce on/off do? > Ans: It disables RoCE capability of the function before its enumerated, > so when driver reads the capability from the device firmware, it is > disabled. > At this point RDMA stack will not be able to create UD, QP1, RC, XRC > type of QPs. When RoCE is disabled, the GID table of all ports of the > device is disabled in the device and software stack. > > 2. How is the roce 'port function' option different from existing > devlink param? > Ans: RoCE attribute at the port function level disables the RoCE > capability at the specific function level; while enable_roce only does > at the software level. > > 3. Why is this option for disabling only RoCE and not the whole RDMA > device? > Ans: Because user still wants to use the RDMA device for non RoCE > commands in more memory efficient way. What are those "non-RoCE commands" that user may want to use "in a more efficient way"?