CC += Alex On 3/6/2019 11:12 AM, Parav Pandit wrote: > Hi Kirti, > >> -----Original Message----- >> From: Kirti Wankhede <kwankh...@nvidia.com> >> Sent: Tuesday, March 5, 2019 9:51 PM >> To: Parav Pandit <pa...@mellanox.com>; Jakub Kicinski >> <jakub.kicin...@netronome.com> >> Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; linux- >> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >> gre...@linuxfoundation.org; Jiri Pirko <j...@mellanox.com> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> >> >> On 3/6/2019 6:14 AM, Parav Pandit wrote: >>> Hi Greg, Kirti, >>> >>>> -----Original Message----- >>>> From: Parav Pandit >>>> Sent: Tuesday, March 5, 2019 5:45 PM >>>> To: Parav Pandit <pa...@mellanox.com>; Kirti Wankhede >>>> <kwankh...@nvidia.com>; Jakub Kicinski >> <jakub.kicin...@netronome.com> >>>> Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; linux- >>>> ker...@vger.kernel.org; michal.l...@markovi.net; >> da...@davemloft.net; >>>> gre...@linuxfoundation.org; Jiri Pirko <j...@mellanox.com> >>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink >>>> extension >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: linux-kernel-ow...@vger.kernel.org <linux-kernel- >>>>> ow...@vger.kernel.org> On Behalf Of Parav Pandit >>>>> Sent: Tuesday, March 5, 2019 5:17 PM >>>>> To: Kirti Wankhede <kwankh...@nvidia.com>; Jakub Kicinski >>>>> <jakub.kicin...@netronome.com> >>>>> Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; >>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; >>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko >>>>> <j...@mellanox.com> >>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink >>>>> extension >>>>> >>>>> Hi Kirti, >>>>> >>>>>> -----Original Message----- >>>>>> From: Kirti Wankhede <kwankh...@nvidia.com> >>>>>> Sent: Tuesday, March 5, 2019 4:40 PM >>>>>> To: Parav Pandit <pa...@mellanox.com>; Jakub Kicinski >>>>>> <jakub.kicin...@netronome.com> >>>>>> Cc: Or Gerlitz <gerlitz...@gmail.com>; netdev@vger.kernel.org; >>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; >>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko >>>>>> <j...@mellanox.com> >>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>>>> extension >>>>>> >>>>>> >>>>>> >>>>>>> I am novice at mdev level too. mdev or vfio mdev. >>>>>>> Currently by default we bind to same vendor driver, but when it >>>>>>> was >>>>>> created as passthrough device, vendor driver won't create netdevice >>>>>> or rdma device for it. >>>>>>> And vfio/mdev or whatever mature available driver would bind at >>>>>>> that >>>>>> point. >>>>>>> >>>>>> >>>>>> Using mdev framework, if you want to partition a physical device >>>>>> into multiple logic devices, you can bind those devices to same >>>>>> vendor driver through vfio-mdev, where as if you want to >>>>>> passthrough the device bind it to vfio-pci. If I understand >>>>>> correctly, that is what you are >>>>> looking for. >>>>>> >>>>>> >>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given >>>>> PCI device has existing protocol devices on it such as netdevs and rdma >> dev. >>>>> This device is partitioned while those protocol devices exist and >>>>> mlx5_core, mlx5_ib drivers are loaded on it. >>>>> And we also need to connect these objects rightly to eswitch exposed >>>>> by devlink interface (net/core/devlink.c) that supports eswitch >>>>> binding, health, registers, parameters, ports support. >>>>> It also supports existing PCI VFs. >>>>> >>>>> I don’t think we want to replicate all of this again in mdev subsystem >>>>> [1]. >>>>> >>>>> [1] >>>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt >>>>> >>>>> So devlink interface to migrate users from managing VFs to non_VF >>>>> sub device is natural progression. >>>>> >>>>> However, in future, I believe we would be creating mediated devices >>>>> on user request, to use mdev modules and map them to VM. >>>>> >>>>> Also 'mdev_bus' is created as a class and not as a bus. This limits >>>>> to not use devlink interface whose handle is bus+device name. >>>>> >>>>> So one option is to change mdev from class to bus. >>>>> devlink will create mdevs on the bus, mdev driver can probe these >>>>> devices on host system by default. >>>>> And if told to do passthrough, a different driver exposes them to VM. >>>>> How feasible is this? >>>>> >>>> Wait, I do see a mdev bus and mdevs are created on this bus using >>>> mdev_device_create(). >>>> So how about we create mdevs on this bus using devlink, instead of sysfs? >>>> And driver side on host gets the mdev_register_driver()->probe()? >>>> >>> >>> Thinking more and reviewing more mdev code, I believe mdev fits this >>> need a lot better than new subdev bus, mfd, platform device, or devlink >> subport. >>> For coming future, to map this sub device (mdev) to VM will also be easier >> by using mdev bus. >>> >> >> Thanks for taking close look at mdev code. >> >> Assigning mdev to VM support is already in place, QEMU and libvirt have >> support to assign mdev device to VM. >> >>> I also believe we can use the sysfs interface for mdev life cycle. >>> Here when mdev are created it will register as devlink instance and >>> will be able to query/config parameters before driver probe the device. >>> (instead of having life cycle via devlink) >>> >>> Few enhancements would be needed for mdev side. >>> 1. making iommu optional. >> >> Currently mdev devices are not IOMMU aware, vendor driver is responsible >> for programming IOMMU for mdev device, if required. >> IOMMU aware mdev device patch set is almost reviewed and ready to get >> pulled. This is optional, vendor driver have to decide whether mdev device >> should be associated with its parents IOMMU or not. I'm testing it and I >> think Alex is on vacation and this will get pulled when Alex will be back >> from >> vacation. >> https://lwn.net/Articles/779650/ >> >>> 2. configuring mdev device parameters during creation time >>> >> >> Mdev framework provides a way to define multiple types for creation >> through sysfs. You can define multiple types rather than having creation >> time parameter and on creation accordingly update 'available_instances'. >> Mdev also provides a way to provide vendor-specific-attributes for parent >> physical device as well as for created mdev device. You can add sysfs >> interface to get input parameters for a mdev device which can be used by >> vendor driver when open() on that mdev device is called. >> >> Thanks, >> Kirti > > Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon. > Will wait for a day to receive more comments/views from Greg and others. > > As I explained in this cover-letter and discussion, > First use case is to create and use mdevs in the host (and not in VM). > Later on, I am sure once we have mdevs available, VM users will likely use it. > > So, mlx5_core driver will have two components as starting point. > > 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c > This is mdev device life cycle driver which will do, mdev_register_device() > and implements mlx5_mdev_ops. > Ok. I would suggest not use mdev.c file name, may be add device name, something like mlx_mdev.c or vfio_mlx.c
> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c > This is mdev device driver which does mdev_register_driver() > and probe() creates netdev by heavily reusing existing code of the PF device. > These drivers will not be placed under drivers/vfio/mdev, because this is not > a vfio driver. > This is fine, right? > I'm not too familiar with netdev, but can you create netdev on open() call on mlx mdev device? Then you don't have to write mdev device driver. > Given that this is net driver, we will be submitting patches, > through netdev mailing list through Dave Miller's net-next tree. > And CC k...@vger.kernel.org, you and others as usual. > Are you ok, merging code this way as mdev device creator and mdev driver. > Yes? > Keep Alex and me in loop. Thanks, Kirti