On Tue, Jun 26, 2018 at 04:47:05PM +0000, Vadim Pasternak wrote: > > > > -----Original Message----- > > From: Guenter Roeck [mailto:li...@roeck-us.net] > > Sent: Tuesday, June 26, 2018 7:33 PM > > To: Vadim Pasternak <vad...@mellanox.com> > > Cc: Andrew Lunn <and...@lunn.ch>; da...@davemloft.net; > > netdev@vger.kernel.org; rui.zh...@intel.com; edubez...@gmail.com; > > j...@resnulli.us; mlxsw <ml...@mellanox.com>; Michael Shych > > <michae...@mellanox.com> > > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface > > with FAN fault attribute > > > > On Tue, Jun 26, 2018 at 02:47:01PM +0000, Vadim Pasternak wrote: > > > > > > > > > > -----Original Message----- > > > > From: Andrew Lunn [mailto:and...@lunn.ch] > > > > Sent: Tuesday, June 26, 2018 5:29 PM > > > > To: Vadim Pasternak <vad...@mellanox.com> > > > > Cc: da...@davemloft.net; netdev@vger.kernel.org; li...@roeck-us.net; > > > > rui.zh...@intel.com; edubez...@gmail.com; j...@resnulli.us; mlxsw > > > > <ml...@mellanox.com>; Michael Shych <michae...@mellanox.com> > > > > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon > > > > interface with FAN fault attribute > > > > > > > > > +static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev, > > > > > + struct device_attribute *attr, > > > > > + char *buf) > > > > > +{ > > > > > + struct mlxsw_hwmon_attr *mlwsw_hwmon_attr = > > > > > + container_of(attr, struct mlxsw_hwmon_attr, > > > > dev_attr); > > > > > + struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon; > > > > > + char mfsm_pl[MLXSW_REG_MFSM_LEN]; > > > > > + u16 tach; > > > > > + int err; > > > > > + > > > > > + mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index); > > > > > + err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm), > > > > mfsm_pl); > > > > > + if (err) { > > > > > + dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query > > > > fan\n"); > > > > > + return err; > > > > > + } > > > > > + tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl); > > > > > + > > > > > + return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 : > > > > > +0); } > > > > > > > > Documentation/hwmon/sysfs-interface says: > > > > > > > > Alarms are direct indications read from the chips. The drivers do > > > > NOT make comparisons of readings to thresholds. This allows > > > > violations between readings to be caught and alarmed. The exact > > > > definition of an alarm (for example, whether a threshold must be met > > > > or must be exceeded to cause an alarm) is chip-dependent. > > > > > > > > Now, this is a fault, not an alarm. But does the same apply? > > > > > Yes, it does. There are no "soft" alarms / faults. > > > > > Hi Andrew, > > > > > > Hardware provides minimum value for tachometer. > > > Tachometer is considered as faulty in case it's below this value. > > > > This is for user space to decide, not for the kernel. > > Hi Guenter, > > Do you suggest to expose provide fan{x}_min, instead of fan{x}_fault > and give to user to compare fan{x}_input versus fan{x}_min for the > fault decision? >
fanX_min only makes sense if programmed into or reported by the chip or controller (that is what the attribute is for), usually to enable the chip/controller to set an alarm. If the chip or controller does not have a minimum speed register, the attribute should not exist, and any decision based on a comparison between a minimum fan speed and the actual fan speed is a user space problem. I don't know what the tach_min calculation is about, but setting it to the minimum of all tachometer speeds (or of all reported minimums ?) is not the task of a hwmon driver. A hwmon driver reports what it gets from hardware; the interpretation is up to other parts of the system (eg userspace or the thermal subsystem). That includes a software-based decision if an alarm or fault should be reported or not. > > > > > In case any tachometer is faulty, PWM according to the system > > > requirements should be set to 100% until the fault > > > > system requirements. Again, this is for user space to decide. > > > Yes, user should decide in this case and I wanted to provide to user > fan{x}_fault for this matter. But it could do it based on input and min > attributes, of course. > Note that "fault" and "alarm" do have distinct different meanings. Many fan controllers can detect if a fan is faulty (eg no sensor connected or it is deemed faulty) or if it just runs too slow. The typical remedy is also different: A slow fan may just need more pwm or voltage, a faulty fan needs to be replaced. Guenter > > > > > is not recovered (f.e. by physical replacing of bad unit). > > > This is the motivation to expose fan{x}_fault in the way it's exposed. > > > > > > Thanks, > > > Vadim. > > > > > > > > > > > Andrew