On Sat, Jul 28, 2018 at 7:06 PM, Bjorn Helgaas <helg...@kernel.org> wrote:
> On Thu, Jul 26, 2018 at 07:00:20AM -0700, Alexander Duyck wrote: > > On Thu, Jul 26, 2018 at 12:14 AM, Jiri Pirko <j...@resnulli.us> wrote: > > > Thu, Jul 26, 2018 at 02:43:59AM CEST, jakub.kicin...@netronome.com > wrote: > > >>On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote: > > >>> On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote: > > >>> > On 7/24/2018 10:51 PM, Jakub Kicinski wrote: > > >>> >>>> The devlink params haven't been upstream even for a full cycle > and > > >>> >>>> already you guys are starting to use them to configure standard > > >>> >>>> features like queuing. > > >>> >>> > > >>> >>> We developed the devlink params in order to support non-standard > > >>> >>> configuration only. And for non-standard, there are generic and > vendor > > >>> >>> specific options. > > >>> >> > > >>> >> I thought it was developed for performing non-standard and > possibly > > >>> >> vendor specific configuration. Look at DEVLINK_PARAM_GENERIC_* > for > > >>> >> examples of well justified generic options for which we have no > > >>> >> other API. The vendor mlx4 options look fairly vendor specific > if you > > >>> >> ask me, too. > > >>> >> > > >>> >> Configuring queuing has an API. The question is it acceptable to > enter > > >>> >> into the risky territory of controlling offloads via devlink > parameters > > >>> >> or would we rather make vendors take the time and effort to model > > >>> >> things to (a subset) of existing APIs. The HW never fits the APIs > > >>> >> perfectly. > > >>> > > > >>> > I understand what you meant here, I would like to highlight that > this > > >>> > mechanism was not meant to handle SRIOV, Representors, etc. > > >>> > The vendor specific configuration suggested here is to handle a > congestion > > >>> > state in Multi Host environment (which includes PF and multiple > VFs per > > >>> > host), where one host is not aware to the other hosts, and each is > running > > >>> > on its own pci/driver. It is a device working mode configuration. > > >>> > > > >>> > This couldn't fit into any existing API, thus creating this > vendor specific > > >>> > unique API is needed. > > >>> > > >>> If we are just going to start creating devlink interfaces in for > every > > >>> one-off option a device wants to add why did we even bother with > > >>> trying to prevent drivers from using sysfs? This just feels like we > > >>> are back to the same arguments we had back in the day with it. > > >>> > > >>> I feel like the bigger question here is if devlink is how we are > going > > >>> to deal with all PCIe related features going forward, or should we > > >>> start looking at creating a new interface/tool for PCI/PCIe related > > >>> features? My concern is that we have already had features such as DMA > > >>> Coalescing that didn't really fit into anything and now we are > > >>> starting to see other things related to DMA and PCIe bus credits. I'm > > >>> wondering if we shouldn't start looking at a tool/interface to > > >>> configure all the PCIe related features such as interrupts, error > > >>> reporting, DMA configuration, power management, etc. Maybe we could > > >>> even look at sharing it across subsystems and include things like > > >>> storage, graphics, and other subsystems in the conversation. > > >> > > >>Agreed, for actual PCIe configuration (i.e. not ECN marking) we do need > > >>to build up an API. Sharing it across subsystems would be very cool! > > I read the thread (starting at [1], for anybody else coming in late) > and I see this has something to do with "configuring outbound PCIe > buffers", but I haven't seen the connection to PCIe protocol or > features, i.e., I can't connect this to anything in the PCIe spec. > > Can somebody help me understand how the PCI core is relevant? If > there's some connection with a feature defined by PCIe, or if it > affects the PCIe transaction protocol somehow, I'm definitely > interested in this. But if this only affects the data transferred > over PCIe, i.e., the data payloads of PCIe TLP packets, then I'm not > sure why the PCI core should care. > > As you wrote, this is not a PCIe feature or affects the PCIe transaction protocol. Actually, due to hardware limitation in current device, we have enabled a workaround in hardware. This mode is proprietary and not relevant to other PCIe devices, thus is set using driver-specific parameter in devlink > > I wonder howcome there isn't such API in place already. Or is it? > > > If it is not, do you have any idea how should it look like? Should it > be > > > an extension of the existing PCI uapi or something completely new? > > > It would be probably good to loop some PCI people in... > > > > The closest thing I can think of in terms of answering your questions > > as to why we haven't seen anything like that would be setpci. > > Basically with that tool you can go through the PCI configuration > > space and update any piece you want. The problem is it can have > > effects on the driver and I don't recall there ever being any sort of > > notification mechanism added to make a driver aware of configuration > > updates. > > setpci is a development and debugging tool, not something we should > use as the standard way of configuring things. Use of setpci should > probably taint the kernel because the PCI core configures features > like MPS, ASPM, AER, etc., based on the assumption that nobody else is > changing things in PCI config space. > > > As far as the interface I don't know if we would want to use something > > like netlink or look at something completely new. > > > > I've gone ahead and added the linux-pci mailing list to the thread. > > > > - Alex > > [1] https://lkml.kernel.org/r/20180719010107.22363-11-sae...@mellanox.com >