Hi Kirti, > -----Original Message----- > From: Kirti Wankhede <kwankh...@nvidia.com> > Sent: Tuesday, March 5, 2019 9:51 PM > To: Parav Pandit <pa...@mellanox.com>; Jakub Kicinski > <jakub.kicin...@netronome.com> > Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko <j...@mellanox.com> > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > On 3/6/2019 6:14 AM, Parav Pandit wrote: > > Hi Greg, Kirti, > > > >> -----Original Message----- > >> From: Parav Pandit > >> Sent: Tuesday, March 5, 2019 5:45 PM > >> To: Parav Pandit <pa...@mellanox.com>; Kirti Wankhede > >> <kwankh...@nvidia.com>; Jakub Kicinski > <jakub.kicin...@netronome.com> > >> Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; linux- > >> ker...@vger.kernel.org; michal.l...@markovi.net; > da...@davemloft.net; > >> gre...@linuxfoundation.org; Jiri Pirko <j...@mellanox.com> > >> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink > >> extension > >> > >> > >> > >>> -----Original Message----- > >>> From: linux-kernel-ow...@vger.kernel.org <linux-kernel- > >>> ow...@vger.kernel.org> On Behalf Of Parav Pandit > >>> Sent: Tuesday, March 5, 2019 5:17 PM > >>> To: Kirti Wankhede <kwankh...@nvidia.com>; Jakub Kicinski > >>> <jakub.kicin...@netronome.com> > >>> Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; > >>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko > >>> <j...@mellanox.com> > >>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink > >>> extension > >>> > >>> Hi Kirti, > >>> > >>>> -----Original Message----- > >>>> From: Kirti Wankhede <kwankh...@nvidia.com> > >>>> Sent: Tuesday, March 5, 2019 4:40 PM > >>>> To: Parav Pandit <pa...@mellanox.com>; Jakub Kicinski > >>>> <jakub.kicin...@netronome.com> > >>>> Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; > >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko > >>>> <j...@mellanox.com> > >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > >>>> extension > >>>> > >>>> > >>>> > >>>>> I am novice at mdev level too. mdev or vfio mdev. > >>>>> Currently by default we bind to same vendor driver, but when it > >>>>> was > >>>> created as passthrough device, vendor driver won't create netdevice > >>>> or rdma device for it. > >>>>> And vfio/mdev or whatever mature available driver would bind at > >>>>> that > >>>> point. > >>>>> > >>>> > >>>> Using mdev framework, if you want to partition a physical device > >>>> into multiple logic devices, you can bind those devices to same > >>>> vendor driver through vfio-mdev, where as if you want to > >>>> passthrough the device bind it to vfio-pci. If I understand > >>>> correctly, that is what you are > >>> looking for. > >>>> > >>>> > >>> We cannot bind a whole PCI device to vfio-pci, reason is, A given > >>> PCI device has existing protocol devices on it such as netdevs and rdma > dev. > >>> This device is partitioned while those protocol devices exist and > >>> mlx5_core, mlx5_ib drivers are loaded on it. > >>> And we also need to connect these objects rightly to eswitch exposed > >>> by devlink interface (net/core/devlink.c) that supports eswitch > >>> binding, health, registers, parameters, ports support. > >>> It also supports existing PCI VFs. > >>> > >>> I don’t think we want to replicate all of this again in mdev subsystem > >>> [1]. > >>> > >>> [1] > >>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > >>> > >>> So devlink interface to migrate users from managing VFs to non_VF > >>> sub device is natural progression. > >>> > >>> However, in future, I believe we would be creating mediated devices > >>> on user request, to use mdev modules and map them to VM. > >>> > >>> Also 'mdev_bus' is created as a class and not as a bus. This limits > >>> to not use devlink interface whose handle is bus+device name. > >>> > >>> So one option is to change mdev from class to bus. > >>> devlink will create mdevs on the bus, mdev driver can probe these > >>> devices on host system by default. > >>> And if told to do passthrough, a different driver exposes them to VM. > >>> How feasible is this? > >>> > >> Wait, I do see a mdev bus and mdevs are created on this bus using > >> mdev_device_create(). > >> So how about we create mdevs on this bus using devlink, instead of sysfs? > >> And driver side on host gets the mdev_register_driver()->probe()? > >> > > > > Thinking more and reviewing more mdev code, I believe mdev fits this > > need a lot better than new subdev bus, mfd, platform device, or devlink > subport. > > For coming future, to map this sub device (mdev) to VM will also be easier > by using mdev bus. > > > > Thanks for taking close look at mdev code. > > Assigning mdev to VM support is already in place, QEMU and libvirt have > support to assign mdev device to VM. > > > I also believe we can use the sysfs interface for mdev life cycle. > > Here when mdev are created it will register as devlink instance and > > will be able to query/config parameters before driver probe the device. > > (instead of having life cycle via devlink) > > > > Few enhancements would be needed for mdev side. > > 1. making iommu optional. > > Currently mdev devices are not IOMMU aware, vendor driver is responsible > for programming IOMMU for mdev device, if required. > IOMMU aware mdev device patch set is almost reviewed and ready to get > pulled. This is optional, vendor driver have to decide whether mdev device > should be associated with its parents IOMMU or not. I'm testing it and I > think Alex is on vacation and this will get pulled when Alex will be back from > vacation. > https://lwn.net/Articles/779650/ > > > 2. configuring mdev device parameters during creation time > > > > Mdev framework provides a way to define multiple types for creation > through sysfs. You can define multiple types rather than having creation > time parameter and on creation accordingly update 'available_instances'. > Mdev also provides a way to provide vendor-specific-attributes for parent > physical device as well as for created mdev device. You can add sysfs > interface to get input parameters for a mdev device which can be used by > vendor driver when open() on that mdev device is called. > > Thanks, > Kirti
Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon. Will wait for a day to receive more comments/views from Greg and others. As I explained in this cover-letter and discussion, First use case is to create and use mdevs in the host (and not in VM). Later on, I am sure once we have mdevs available, VM users will likely use it. So, mlx5_core driver will have two components as starting point. 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c This is mdev device life cycle driver which will do, mdev_register_device() and implements mlx5_mdev_ops. 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c This is mdev device driver which does mdev_register_driver() and probe() creates netdev by heavily reusing existing code of the PF device. These drivers will not be placed under drivers/vfio/mdev, because this is not a vfio driver. This is fine, right? Given that this is net driver, we will be submitting patches, through netdev mailing list through Dave Miller's net-next tree. And CC k...@vger.kernel.org, you and others as usual. Are you ok, merging code this way as mdev device creator and mdev driver. Yes?