On Tue, Jun 26, 2018 at 04:47:05PM +0000, Vadim Pasternak wrote:
> 
> 
> > -----Original Message-----
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Sent: Tuesday, June 26, 2018 7:33 PM
> > To: Vadim Pasternak <vad...@mellanox.com>
> > Cc: Andrew Lunn <and...@lunn.ch>; da...@davemloft.net;
> > netdev@vger.kernel.org; rui.zh...@intel.com; edubez...@gmail.com;
> > j...@resnulli.us; mlxsw <ml...@mellanox.com>; Michael Shych
> > <michae...@mellanox.com>
> > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface
> > with FAN fault attribute
> > 
> > On Tue, Jun 26, 2018 at 02:47:01PM +0000, Vadim Pasternak wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Andrew Lunn [mailto:and...@lunn.ch]
> > > > Sent: Tuesday, June 26, 2018 5:29 PM
> > > > To: Vadim Pasternak <vad...@mellanox.com>
> > > > Cc: da...@davemloft.net; netdev@vger.kernel.org; li...@roeck-us.net;
> > > > rui.zh...@intel.com; edubez...@gmail.com; j...@resnulli.us; mlxsw
> > > > <ml...@mellanox.com>; Michael Shych <michae...@mellanox.com>
> > > > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon
> > > > interface with FAN fault attribute
> > > >
> > > > > +static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev,
> > > > > +                                       struct device_attribute *attr,
> > > > > +                                       char *buf)
> > > > > +{
> > > > > +     struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
> > > > > +                     container_of(attr, struct mlxsw_hwmon_attr,
> > > > dev_attr);
> > > > > +     struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
> > > > > +     char mfsm_pl[MLXSW_REG_MFSM_LEN];
> > > > > +     u16 tach;
> > > > > +     int err;
> > > > > +
> > > > > +     mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index);
> > > > > +     err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm),
> > > > mfsm_pl);
> > > > > +     if (err) {
> > > > > +             dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query
> > > > fan\n");
> > > > > +             return err;
> > > > > +     }
> > > > > +     tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl);
> > > > > +
> > > > > +     return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 :
> > > > > +0); }
> > > >
> > > > Documentation/hwmon/sysfs-interface says:
> > > >
> > > > Alarms are direct indications read from the chips. The drivers do
> > > > NOT make comparisons of readings to thresholds. This allows
> > > > violations between readings to be caught and alarmed. The exact
> > > > definition of an alarm (for example, whether a threshold must be met
> > > > or must be exceeded to cause an alarm) is chip-dependent.
> > > >
> > > > Now, this is a fault, not an alarm. But does the same apply?
> > >
> > Yes, it does. There are no "soft" alarms / faults.
> > 
> > > Hi Andrew,
> > >
> > > Hardware provides minimum value for tachometer.
> > > Tachometer is considered as faulty in case it's below this value.
> > 
> > This is for user space to decide, not for the kernel.
> 
> Hi Guenter,
> 
> Do you suggest to expose provide fan{x}_min, instead of fan{x}_fault
> and give to user to compare fan{x}_input versus fan{x}_min for the
> fault decision?
> 

fanX_min only makes sense if programmed into or reported by the chip
or controller (that is what the attribute is for), usually to enable
the chip/controller to set an alarm. If the chip or controller does
not have a minimum speed register, the attribute should not exist,
and any decision based on a comparison between a minimum fan speed
and the actual fan speed is a user space problem.

I don't know what the tach_min calculation is about, but setting
it to the minimum of all tachometer speeds (or of all reported
minimums ?) is not the task of a hwmon driver. A hwmon driver
reports what it gets from hardware; the interpretation is up
to other parts of the system (eg userspace or the thermal
subsystem). That includes a software-based decision if an alarm
or fault should be reported or not.

> > 
> > > In case any tachometer is faulty, PWM according to the system
> > > requirements should be set to 100% until the fault
> > 
> > system requirements. Again, this is for user space to decide.
> 
> 
> Yes, user should decide in this case and I wanted to provide to user
> fan{x}_fault for this matter. But it could do it based on input and min
> attributes, of course.
> 
Note that "fault" and "alarm" do have distinct different meanings.
Many fan controllers can detect if a fan is faulty (eg no sensor
connected or it is deemed faulty) or if it just runs too slow.
The typical remedy is also different: A slow fan may just need
more pwm or voltage, a faulty fan needs to be replaced.

Guenter

> > 
> > > is not recovered (f.e. by physical replacing of bad unit).
> > > This is the motivation to expose fan{x}_fault in the way it's exposed.
> > >
> > > Thanks,
> > > Vadim.
> > >
> > > >
> > > >      Andrew

Reply via email to