On Tue, 2021-02-02 at 18:14 -0800, Jakub Kicinski wrote: > On Mon, 1 Feb 2021 19:51:50 +0200 Yishai Hadas wrote: > > Currently mlx5 PCI VF and SF are enabled by default for RoCE > > functionality. > > > > Currently a user does not have the ability to disable RoCE for a > > PCI > > VF/SF device before such device is enumerated by the driver. > > > > User is also incapable to do such setting from smartnic scenario > > for a > > VF from the smartnic. > > > > Current 'enable_roce' device knob is limited to do setting only at > > driverinit time. By this time device is already created and > > firmware has > > already allocated necessary system memory for supporting RoCE. > > > > When a RoCE is disabled for the PCI VF/SF device, it saves 1 Mbyte > > of > > system memory per function. Such saving is helpful when running on > > low > > memory embedded platform with many VFs or SFs. > > > > Therefore, it is desired to empower user to disable RoCE > > functionality > > before a PCI SF/VF device is enumerated. > > You say that the user on the VF/SF side wants to save memory, yet > the control knob is on the eswitch instance side, correct? >
yes, user in this case is the admin, who controls the provisioned network function SF/VFs.. by turning off this knob it allows to create more of that resource in case the user/admin is limited by memory. > > This is achieved by extending existing 'port function' object to > > control > > capabilities of a function. This enables users to control > > capability of > > the device before enumeration. > > > > Examples when user prefers to disable RoCE for a VF when using > > switchdev > > mode: > > > > $ devlink port show pci/0000:06:00.0/1 > > pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller > > 0 > > pfnum 0 vfnum 0 external false splittable false > > function: > > hw_addr 00:00:00:00:00:00 roce on > > > > $ devlink port function set pci/0000:06:00.0/1 roce off > > > > $ devlink port show pci/0000:06:00.0/1 > > pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller > > 0 > > pfnum 0 vfnum 0 external false splittable false > > function: > > hw_addr 00:00:00:00:00:00 roce off > > > > FAQs: > > ----- > > 1. What does roce on/off do? > > Ans: It disables RoCE capability of the function before its > > enumerated, > > so when driver reads the capability from the device firmware, it is > > disabled. > > At this point RDMA stack will not be able to create UD, QP1, RC, > > XRC > > type of QPs. When RoCE is disabled, the GID table of all ports of > > the > > device is disabled in the device and software stack. > > > > 2. How is the roce 'port function' option different from existing > > devlink param? > > Ans: RoCE attribute at the port function level disables the RoCE > > capability at the specific function level; while enable_roce only > > does > > at the software level. > > > > 3. Why is this option for disabling only RoCE and not the whole > > RDMA > > device? > > Ans: Because user still wants to use the RDMA device for non RoCE > > commands in more memory efficient way. > > What are those "non-RoCE commands" that user may want to use "in a > more > efficient way"? RAW eth QP, i think you already know this one, it is a very thin layer that doesn't require the whole rdma stack.