On Fri, 21 Aug 2020 13:30:21 +0300 Ido Schimmel wrote: > On Thu, Aug 20, 2020 at 09:09:42AM -0700, Jakub Kicinski wrote: > > On Thu, 20 Aug 2020 08:35:25 -0600 David Ahern wrote: > > > On 8/19/20 12:07 PM, Jakub Kicinski wrote: > > > > I don't have a great way forward in mind, sadly. All I can think of is > > > > that we should try to create more well defined interfaces and steer > > > > away from free-form ones. > > > > > > There is a lot of value in free-form too. > > > > On Tue, 18 Aug 2020 20:35:01 -0700 Jakub Kicinski wrote: > > > It's a question of interface, not the value of exposed data. > > > > > > Example, here if the stats are vxlan decap/encap/error - we should > > > > expose that from the vxlan module. That way vxlan module defines one > > > > set of stats for everyone. > > > > > > > > In general unless we attach stats to the object they relate to, we will > > > > end up building parallel structures for exposing statistics from the > > > > drivers. I posted a set once which was implementing hierarchical stats, > > > > but I've abandoned it for this reason. > > > > > [...] > > > > > > > > IDK. I just don't feel like this is going to fly, see how many names > > > > people invented for the CRC error statistic in ethtool -S, even tho > > > > there is a standard stat for that! And users are actually parsing the > > > > output of ethtool -S to get CRC stats because (a) it became the go-to > > > > place for NIC stats and (b) some drivers forget to report in the > > > > standard place. > > > > > > > > The cover letter says this set replaces the bad debugfs with a good, > > > > standard API. It may look good and standard for _vendors_ because they > > > > will know where to dump their counters, but it makes very little > > > > difference for _users_. If I have to parse names for every vendor I use, > > > > I can as well add a per-vendor debugfs path to my script. > > > > > > > > The bar for implementation-specific driver stats has to be high. > > > > > > My take away from this is you do not like the names - the strings side > > > of it. > > > > > > Do you object to the netlink API? The netlink API via devlink? > > > > > > 'perf' has json files to describe and document counters > > > (tools/perf/pmu-events). Would something like that be acceptable as a > > > form of in-tree documentation of counters? (vs Documentation/networking > > > or URLs like > > > https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters) > > > > > > > Please refer to what I said twice now about the definition of the stats > > exposed here belonging with the VxLAN code, not the driver. > > Please refer to the changelog: > > "The Spectrum ASICs have a single hardware VTEP that is able to perform > VXLAN encapsulation and decapsulation. The VTEP is logically mapped by > mlxsw to the multiple VXLAN netdevs that are using it. Exposing the > counters of this VTEP via the multiple VXLAN netdevs that are using it > would be both inaccurate and confusing for users. > > Instead, expose the counters of the VTEP via devlink-metric. Note that > Spectrum-1 supports a different set of counters compared to newer ASICs > in the Spectrum family."
I read your cover letter, I didn't say the metrics have to be reported on particular netdevs. > Hardware implementations will rarely fit 1:1 to the nice and discrete > software implementations that they try to accelerate. The purpose of > this API is exposing metrics specific to these hardware implementations. > This results in better visibility which can be leveraged for faster > debugging and more thorough testing. > > The reason I came up with this interface is not the specific VXLAN > metrics that bother you, but a new platform we are working on. It uses > the ASIC as a cache that refers lookups to an external device in case of > cache misses. It is completely transparent to user space (you get better > scale), but the driver is very much aware of this stuff as it needs to > insert objects (e.g., routes) in a way that will minimize cache misses. > Just checking that ping works is hardly enough. We must be able to read > the cache counters to ensure we do not see cache misses when we do not > expect them. > > As another example, consider the algorithmic TCAM implementation we have > in Spectrum-2 for ACLs [1]. While a user simply adds / deletes filters, > the driver needs to jump through multiple hops in order to program them > in a way that will result in a better scale and reduced latency. We > currently do not have an interface through which we can expose metrics > related to this specific implementation. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=756cd36626f773e9a72a39c1dd12da4deacfacdf > > > > > > > Okay, fair. I just think that in datacenter deployments we are way > > > > closer to the SDK model than people may want to admit. > > > > > > I do not agree with that; the SDK model means you *must* use vendor code > > > to make something work. Your argument here is about labels for stats and > > > an understanding of their meaning. > > > > Sure, no "must" for passing packets, but you "must" use vendor tooling > > to operate a fleet. > > > > Since everybody already has vendor tools what value does this API add? > > We don't have any "vendor tools" to get this information. Our team is > doing everything it possibly can in order to move away from such an > approach. For clarity I wasn't speaking about your switches, sadly I have no experience with those. > > I still need per vendor logic. Let's try to build APIs which will > > actually make user's life easier, which users will want to switch to. > > Developers are also users and they should be able to read whatever > information they need from the device in order to help them do their > work. You have a multitude of tools (e.g., kprobes, tracepoints) to get > better visibility into the software data path. Commonality is not a > reason to be blind as a bat when looking into the hardware data path. How many times do I have to say that I'm not arguing against the value of the data? If you open up this interface either someone will police it, or it will become a dumpster.