Mon, Jun 27, 2016 at 03:02:31PM CEST, ro...@cumulusnetworks.com wrote: >[resending ...my previous reply sent some non-text content] > >On Sun, Jun 26, 2016 at 11:51 PM, Jiri Pirko <j...@resnulli.us> wrote: >> >> Mon, Jun 27, 2016 at 04:53:53AM CEST, ro...@cumulusnetworks.com wrote: >> >On Sun, Jun 26, 2016 at 11:15 AM, Jiri Pirko <j...@resnulli.us> wrote: >> >> Sun, Jun 26, 2016 at 08:06:40PM CEST, ro...@cumulusnetworks.com wrote: >> >>>On 6/26/16, 2:33 AM, Jiri Pirko wrote: >> >>>> Sat, Jun 25, 2016 at 05:50:59PM CEST, ro...@cumulusnetworks.com wrote: >> >>>>> On Thu, Jun 23, 2016 at 8:40 AM, Jiri Pirko <j...@resnulli.us> wrote: >> >>>>>> Thu, Jun 23, 2016 at 05:11:26PM CEST, anurad...@cumulusnetworks.com >> >>>>>> wrote: >> >>>>>>>>>>> we can't separate CPU and HW stats there. In some cases (or >> >>>>>>>>>>> ASICs) HW >> >>>>>>>>>>> counters do >> >>>>>>>>>>> not include CPU generated packets....you will have to add CPU >> >>>>>>>>>>> generated pkt counters to the >> >>>>>>>>>>> hw counters for such virtual device stats. >> >>>>>>>>>> Can you please provide and example how that could happen? >> >>>>>>>>> example is the bridge vlan stats I mention below. These are >> >>>>>>>>> usually counted >> >>>>>>>>> by attaching hw virtual counter resources. And CPU generated >> >>>>>>>>> packets >> >>>>>>>>> in some cases maybe setup to bypass the ASIC pipeline because the >> >>>>>>>>> CPU >> >>>>>>>>> has already made the required decisions. So, they may not be >> >>>>>>>>> counted by >> >>>>>>>>> by such hw virtual counters. >> >>>>>>>> Bypass ASIC? How do the packets get on the wire? >> >>>>>>>> >> >>>>>>> Bypass the "forwarding pipeline" in the ASIC that is. Obviously the >> >>>>>>> ASIC ships the CPU generated packet out of the switch/front-panel >> >>>>>>> port. Continuing Roopa's example of vlan netdev stats.... To get the >> >>>>>>> HW stats counters are typically tied to the ingress and egress vlan >> >>>>>>> hw >> >>>>>>> entries. All the incoming packets are subject to the ingress vlan >> >>>>>>> lookup irrespective of whether they get punted to the CPU or whether >> >>>>>>> they are forwarded to another front panel port. In that case the >> >>>>>>> ingress HW stats does represent all packets. However for CPU >> >>>>>>> originated packets egress vlan lookups are bypassed in the ASIC (this >> >>>>>>> is common forwarding option in most ASICs) and the packet shipped as >> >>>>>>> is out of front-panel port specified by the CPU. Which means these >> >>>>>>> packets will NOT be counted against the egress VLAN HW counter; hence >> >>>>>>> the need for summation. >> >>>>>> Driver will know about this, and will provide the stats accordignly to >> >>>>>> the core. Who else than driver should resolve this. >> >>>>>> >> >>>>> The point was/is that there should be only two categories: >> >>>>> 1) the base default stats: can contain 'only sw', 'only hw' or 'a >> >>>>> summation of hw and sw' in some cases. >> >>>>> The user does not care about the breakdown. >> >>>>> >> >>>>> 2) everything else falls into the second category: driver provided >> >>>>> breakdown of stats for easier debugging. >> >>>>> This today is ethtool stats and we can have an equivalent nested >> >>>>> attribute for this in the new stats api. >> >>>>> Lets call it IFLA_STATS_LINK_DRIVER or you pick a name. Lets make it >> >>>>> nested and extensible (like ethtool is) and >> >>>>> driver can expose any kind of stats there. >> >>>>> ie lets move the stats you are proposing to this category of stats..... >> >>>>> instead of introducing a third category 'SW stats'. >> >>>> What you are proposing is essentially what our patchset does. We expose >> >>>> 2 sets of stats. hw and pure sw. hw includes all, driver will take >> >>>> care of it cause he knows what is going on in hw. >> >>>the splitting into hw and sw is causing some confusion with respect >> >> >> >> I still don't get why you are talking about split :( I see no split. >> >> >> >> >> >>>to existing stats and will be confusing for future stats. And i am not >> >>>sure how many >> >>>users would prefer the split this way. >> >>>So, instead of doing the split, i think we should at this time introduce >> >>>driver specific stats (like ethtool) as a nested netlink attribute. >> >> >> >> The default netlink stats should be hw (or accumulated as you call it). >> >> The reason is to avoid confusion for existing apps. Another attribute is >> >> possible for more break-out stats - that is what this patchset is doing. >> >> >> >> Ethtool stats are wrong and useless for apps as they are driver-specific. >> > >> >apps only care about overall stats. thats the aggregate stats >> >provided by the default netlink netdev api to the user...which already >> >exists. >> > >> >they don't care about your new breakdown either. >> >> Agreed. That is what our patchset is doing. >> >> >> > >> >breakdown of stats are used for debugging and thats what ethtool stats >> >provide. >> > >> > >> > >> >> >> >> >> >>>> >> >>>> Btw mirroring random string stats into Netlink is not a good idea IMO. >> >>>Any reason you say that ?. I am thinking it would be much easier with >> >>>netlink. >> >>>keeping it simple, it is a nested attribute with stat-name and value pair. >> >>> >> >>>struct stat { >> >>> char stats_name[STATS_NAME_LEN]; /* STATS_NAME_LEN = 32 */ >> >>> __u64 stat; >> >>>}; >> >> >> >> No please. This should be well defined generic group of stats. >> >> Driver-specific names/stats stats are wrong. >> >> >> > >> >they are meant for debugging. are you saying the new stats api should >> >not contain 'ethtool like' stats ? >> > >> >ethtool stats are very valuable today. They are extensible. >> >They cannot be made generic and they are specific to a hardware or use case. >> > >> >We use it for our switch port stats too. Base aggregate stats summed >> >up and provided as default netdev stats. via ethtool we provide lot >> >more hardware specific breakdown. >> >> Leave it in ethtool then. I really think it is not idea to put random >> named-stats in netlink. > > > >The point of the stats api is to bring all stats together. >ethtool stats are very valuable. so there is no point leaving them out. >they will be in here sooner or later. > >when there is already a discussion of moving ethtool to netlink,..i am not sure >I understand the hesitation on moving ethtool stats to netlink.
It's not well defined. That's the issue. It's "debugfs"... > >> >> >> This patchset uses well defined values for slowpatch(/sw/cpu) stats. >> That is I believe the only way to do this. >> > >It seems to be your preferred way. I don't think it is the only way to >do it. It can very well be part of the nested >breakdown stats i am talking about. We are using existing well-defined structure. Very easy and clean.