On 1/1/2019 3:47 AM, Jakub Kicinski wrote: > On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote: >> The health mechanism is targeted for Real Time Alerting, in order to know >> when >> something bad had happened to a PCI device >> - Provide alert debug information >> - Self healing >> - If problem needs vendor support, provide a way to gather all needed >> debugging >> information. >> >> The main idea is to unify and centralize driver health reports in the >> generic devlink instance and allow the user to set different >> attributes of the health reporting and recovery procedures. >> >> The devlink health reporter: >> Device driver creates a "health reporter" per each error/health type. >> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx >> error) >> or unknown (driver specific). >> For each registered health reporter a driver can issue error/health reports >> asynchronously. All health reports handling is done by devlink. >> Device driver can provide specific callbacks for each "health reporter", e.g. >> - Recovery procedures >> - Diagnostics and object dump procedures >> - OOB initial attributes >> Different parts of the driver can register different types of health >> reporters >> with different handlers. >> >> Once an error is reported, devlink health will do the following actions: >> * A log is being send to the kernel trace events buffer >> * Health status and statistics are being updated for the reporter instance >> * Object dump is being taken and saved at the reporter instance (as long >> as >> there is no other Objdump which is already stored) >> * Auto recovery attempt is being done. Depends on: >> - Auto-recovery configuration >> - Grace period vs. time passed since last recover >> >> The user interface: >> User can access/change each reporter attributes and driver specific callbacks >> via devlink, e.g per error type (per health reporter) >> - Configure reporter's generic attributes (like: Disable/enable auto >> recovery) >> - Invoke recovery procedure >> - Run diagnostics >> - Object dump > > Thanks for continuing this work! :) > >> The devlink health interface (via netlink): >> DEVLINK_CMD_HEALTH_REPORTER_GET >> Retrieves status and configuration info per DEV and reporter. >> DEVLINK_CMD_HEALTH_REPORTER_SET >> Allows reporter-related configuration setting. >> DEVLINK_CMD_HEALTH_REPORTER_RECOVER >> Triggers a reporter's recovery procedure. >> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE >> Retrieves diagnostics data from a reporter on a device. >> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET > > The addition of "objdump" and its marshalling is a bit disappointing. > It seemed to me when region snapshots were added that they would serve > this exact purpose. Taking a region snapshot or "core dump", if you > will, after something went south. Now it's confusing to have two > mechanisms serving the same purpose.
The motivation here was that the driver can provide reporters to its sub-modules, such that each reporter will be able to provide all needed info and recover methods to face run time errors. The implementation of the objdump function is in the hands of the reporter developer, and he can dump whatever he thinks it is needed. Keep in mind that a driver can have many reporters (TX, RX, FW, command interface, etc). For most of the reporters, there is important information in the driver which can not be fetched with region snapshot (intended for memory fetching only). For SW info, driver shall build the info and send it interpreted (unlike all dumps / region available mechanisms) I have in plans to extend the TX reporter to have objdump method in which I will pass many SQ SW attributes that can be very handy in a debug session. > > It's unclear from quick reading of the code how if the buffers get > timestamped. Can you report more than one? The timestamp which devlink health reports on, is the timestamp in which it got the buffers filled by the driver. Every dump/diagnose has one ts. Per reporter, it is possible to store up to one dump. only clear command can remove it and makes the reporter ready to fetch a new objdump. > > About the marshalling, I'm not sure it belongs in the kernel. There is > precedent for adding interpretation of FW blobs in user space (ethtool). > IMHO it's a more scalable approach, if we want to avoid having 100 kLoC > drivers. Amount of code you add to print the simple example from last > patch is not inspiring confidence. The idea was to provide the developer the ability to create "tree-like" of information, it is needed when you want to describe complex objects. This caused a longer coding, but I don't think we are even close to the scale you are talking about. We can remove the tree flexibility, and move to array format, it will make the code of storing data by the driver to be shorter, but we will lose the ability to have it in tree-like format. And again, regarding ethtool, it is a tool for dumping HW/FW, this could have been an argument for the region snapshot, not for the devlink health... > > And on the bike shedding side :) -> I think you should steer clear of > calling this objdump, as it has very little to do with the > functionality of well-known objdump tool. Its not even clear what the > object the name is referring to is. Let's agree on concept, we can change name to dump. Reporter->dump is very clear when you know what the reporter is. > > Long story short the overlap with region snapshots makes it unclear > what purpose either serves, and IMHO we should avoid the marshalling by > teaching user space how to interpret snapshots. Preferably we only > have one dump mechanism, and user space can be taught the interpretation > once. Forcing SW reporters to use region snapshot mechanism sounds like a bad idea. To summarize, In order to have the health system working properly, it must have a way to objdump/dump itself and provide it to the developer for debug. Removing the objdump part will make it useless for run-time debug. I think this is a powerful tool and we shall not ask the user level scripts to fetch info from many other partial tools. It shall all be focused in one place (recover, diagnose, objdump, statistics). > >> Retrieves the last stored objdump. Devlink health >> saves a single objdump. If an objdump is not already stored by the devlink >> for this reporter, devlink generates a new objdump. >> Objdump output is defined by the reporter. >> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR >> Clears the last saved objdump file for the specified reporter. >> >> >> netlink >> +--------------------------+ >> | | >> | + | >> | | | >> +--------------------------+ >> |request for ops >> |(diagnose, >> mlx5_core devlink |recover, >> |dump) >> +--------+ +--------------------------+ >> | | | reporter| | >> | | | +---------v----------+ | >> | | ops execution | | | | >> | <----------------------------------+ | | >> | | | | | | >> | | | + ^------------------+ | >> | | | | request for ops | >> | | | | (recover, dump) | >> | | | | | >> | | | +-+------------------+ | >> | | health report | | health handler | | >> | +-------------------------------> | | >> | | | +--------------------+ | >> | | health reporter create | | >> | +----------------------------> | >> +--------+ +--------------------------+ >> >> Available reporters: >> In this patchset, three reporters of mlx5e driver are included. The FW >> reporters implement devlink_health_reporter diagnostic, object dump and >> recovery procedures. The TX reporter implements devlink_health_reporter >> diagnostic and recovery procedures. >> >> This RFC is based on the same concepts as previous RFC I sent earlier this >> year: "[RFC PATCH iproute2-next] System specification health API". The API >> was >> changed, also devlink code and mlx5e reporters were not available at the >> previous RFC.