On Fri, Mar 01, 2019 at 05:21:13PM +0000, Parav Pandit wrote: > > > > -----Original Message----- > > From: Greg KH <gre...@linuxfoundation.org> > > Sent: Friday, March 1, 2019 1:22 AM > > To: Parav Pandit <pa...@mellanox.com> > > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > > michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko > > <j...@mellanox.com> > > Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to > > subdev devices > > > > On Thu, Feb 28, 2019 at 11:37:52PM -0600, Parav Pandit wrote: > > > Add a subdev driver to probe the subdev devices and create fake > > > netdevice for it. > > > > So I'm guessing here is the "meat" of the whole goal here? > > > > You just want multiple netdevices per PCI device? Why can't you do that > > today in your PCI driver? > > > Yes, but it just not multiple netdevices. > Let me please elaborate in detail. > > There is a swichdev mode of a PCI function for netdevices. > In this mode a given netdev has additional control netdev (called representor > netdevice = rep-ndev). > This rep-ndev is attached to OVS for adding rules, offloads etc using > standard tc, netfilter infra. > Currently this rep-ndev controls switch side of the settings, but not the > host side of netdev. > So there is discussion to create another netdev or devlink port.. > > Additionally this subdev has optional rdma device too. > > And when we are in switchdev mode, this rdma dev has similar rdma rep device > for control. > > In some cases we actually don't create netdev when it is in InfiniBand mode. > Here there is PCI device->rdma_device. > > In other case, a given sub device for rdma is dual port device, having > netdevice for each that can use existing netdev->dev_port. > > Creating 4 devices of two different classes using one iproute2/ip or > iproute2/rdma command is horrible thing to do.
Why is that? > In case if this sub device has to be a passthrough device, ip link command > will fail badly that day, because we are creating some sub device which is > not even a netdevice. But it is a network device, right? > So iproute2/devlink which works on bus+device, mainly PCI today, seems right > abstraction point to create sub devices. > This also extends to map ports of the device, health, registers debug, etc > rich infrastructure that is already built. > > Additionally, we don't want mlx driver and other drivers to go through its > child devices (split logic in netdev and rdma) for power management. And how is power management going to work with your new devices? All you have here is a tiny shim around a driver bus, I do not see any new functionality, and as others have said, no way to actually share, or split up, the PCI resources. > Kernel core code does that well today, that we like to leverage through > subdev bus or mfd pm callbacks. > > So it is lot more than just creating netdevices. But that's all you are showing here :) > > What problem are you trying to solve that others also are having that > > requires all of this? > > > > Adding a new bus type and subsystem is fine, but usually we want more > > than just one user of it, as this does not really show how it is exercised > > very > > well. > This subdev and devlink infrastructure solves this problem of creating > smaller sub devices out of one PCI device. > Someone has to start.. :-) That is what a mfd should allow you to do. > To my knowledge, currently Netronome, Broadcom and Mellanox are actively > using this devlink and switchdev infra today. Where are they "using it"? This patchset does not show that. > > Ideally 3 users would be there as that is when it proves itself that it is > > flexible enough. > > > > We were looking at drivers/visorbus if we can repurpose it, but GUID device > naming scheme is just not user friendly. You can always change the naming scheme if needed. But why isn't a GUID ok? It's very easy to reserve properly, and you do not need a central naming "authority". > > Would just using the mfd subsystem work better for you? That provides > > core support for "multi-function" drivers/devices already. What is missing > > from that subsystem that does not work for you here? > > > We were not aware of mfd until now. I looked at very high level now. It's a > wrapper to platform devices and seems widely use. > Before subdev proposal, Jason suggested an alternative is to create platform > devices and driver attach to it. > > When I read kernel documentation [1], it says "platform devices typically > appear as autonomous entities" > Here instead of autonomy, it is in user's control. > Platform devices probably don't disappear a lot in live system as opposed to > subdevices which are created and removed dynamically a lot often. > > Not sure if platform device is abuse for this purpose or not. No, do not abuse a platform device. You should be able to just use a normal PCI device for this just fine, and if not, we should be able to make the needed changes to mfd for that. thanks, greg k-h