Le lun. 1 juil. 2019 à 22:47, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> Re-sent to list.
>
> > On 1 Jul 2019, at 18:49, Gilles Sadowski <gillese...@gmail.com> wrote:
> >
> > Le lun. 1 juil. 2019 à 19:10, Alex Herbert <alex.d.herb...@gmail.com 
> > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >> On 01/07/2019 08:59, Gilles Sadowski wrote:
> >>> Hello.
> >>>
> >>> Le lun. 1 juil. 2019 à 00:50, Virendra singh Rajpurohit
> >>> <virendrasing...@gmail.com> a écrit :
> >>>> Hi everyone,
> >>>> I've been working on descriptive module of "Commons-Statistics". My
> >>>> question is should I extend
> >>> I'll give hints as if the question mark was after the word "extend".
> >>>
> >>> For starter, please read this:
> >>>     
> >>> https://books.google.fr/books?id=ka2VUBqHiWkC&pg=PA81&lpg=PA81&dq=java+inheritance+bloch&source=bl&ots=y_IgRku_OX&sig=ACfU3U3VMmrHT2ILGJQluunGLJcEOMICBA&hl=fr&sa=X&ved=2ahUKEwicsvv6nJPjAhVwx4UKHdU6BFUQ6AEwBXoECAkQAQ#v=onepage&q=java%20inheritance%20bloch&f=false
> >>> (and the whole book is advice to follow).
> >>>
> >>> Then, there might be some useful information here:
> >>>     
> >>> https://stackoverflow.com/questions/218744/good-reasons-to-prohibit-inheritance-in-java
> >>>
> >>> Regards,
> >>> Gilles
> >>
> >> This issue arises because the algorithms for the mean and the variance
> >> share computation and the standard deviation is derived from the variance.
> >>
> >> They can be combined using inheritance. The reference above indicates
> >> that this is poor practice unless you specifically design for it.
> >> Splitting them all apart as separate classes will result in code
> >> duplication. This is also considered poor practice without a good
> >> reason. However there is one good reason in this case (as I will show
> >> later).
> >
> > Avoiding duplication is a corollary of inheritance, but not the condition
> > for using it.
> > There must an "is-a" relationship.
> >
> >> The solution in Commons Math was to have a package-private set of moment
> >> classes that have an inheritance hierarchy. The public facing classes
> >> can then wrap these internal classes and avoid inheritance. However in
> >> Commons math some of the public constructors depended on package private
> >> moment classes, and one moment class was public. So the API was flawed.
> >> This is the subject of an issue which discusses removing the moment
> >> classes and avoiding inheritance [1]. The outcome of this discussion
> >> (from 2016) is not in the Jira ticket.
> >>
> >> There is also the question of whether the third and fourth central
> >> moments should be ported from Commons Math with the associated Skewness
> >> and Kurtosis statistics. It should be noted that at the limit of this
> >> hierarchy the fourth central moment has the data to compute Mean,
> >> Variance+Standard Deviation, Skewness and Kurtosis.
> >>
> >> Although it is possible to compute many statistics from a moment I would
> >> say it can be assumed if the user has selected a specific statistic
> >> class then they are only interested in 1 statistic. Construction of a
> >> flexible SummaryStatistics class to provide multiple statistics should
> >> be part of the design and so objects to compute different moments and
> >> the ability to use them to compute different statistics should be part
> >> of the design. This can be addressed separately but should be kept in mind.
> >>
> >> If inheritance is eliminated at the jump from 1st to 2nd moment it will
> >> result in a small amount of code duplication.:
> >>
> >> Moment1 = 4 lines
> >> Moment2 = 1 extra line (5 total)
> >> Moment3 = 4 extra lines (9 total)
> >> Moment4 = 4 extra lines (13 total)
> >>
> >> However all protected variables used in the hierarchy can be local to
> >> the method and the code (although duplicated) may be easier to follow.
> >>
> >> If the requirements of a moment are specified in an interface it can be
> >> seen that the hierarchy is simple:
> >>
> >> interface FirstMoment {
> >>     long getN();
> >>     double getM1();
> >>     void accept(double d);
> >>     void combine(FirstMoment m1);
> >> }
> >>
> >> interface SecondMoment extends FirstMoment {
> >>     double getM2();
> >>     void combine(SecondMoment m2);
> >> }
> >>
> >> interface ThirdMoment extends SecondMoment {
> >>     double getM3();
> >>     void combine(ThirdMoment m3);
> >> }
> >
> > "is-a" condition is not met.
>
> Do you mean that ThirdMoment “is-a” SecondMoment is not true? While this is 
> correct the way the algorithm is implemented means that a ThirdMoment does 
> have all the data for the lower order moments.
>
> >
> > Another direction to explore:
> >
> > interface Moment<T> {
> >    long getN();
> >    double getValue();
> >    void accept(double);
> >    void combine(Moment<T> m);
> > }
>
> And <T> is an enum?

It's not what I had in mind, but I didn't pursue it because I thought
that testing different approaches was Virendra's task...
As you point out much below, first step is to lay out the minimally
supported API.

Gilles

> I do not see how this helps this:
>
> ThirdMoment m3 = …
> Variance var = new Variance(m3);
>
> If ThirdMoment is to be used generically it needs a method to obtain the 
> value for all moments from 3 to 1. The single ‘getValue()’ is not enough. It 
> would require:
>
> interface Moment<T extends MomentOrder> {
>    long getN();
>    double getValue(M extends MomentOrder);
>    void accept(double);
>    void combine(Moment<T> m);
> }
>
> Where M is <= T in the enum ranking. So basically it is an integer and you 
> can restrict the combine method using self-types:
>
> interface Moment<T extends Moment<T>> {
>    long getN();
>    double getValue(int order);
>    void accept(double);
>    void combine(T m);
> }
>
> With this you have a contract that a Moment can only be combined with 
> instances of the same class. But you do not have a restriction on what orders 
> you will support. So for example the Variance constructor is:
>
> <T extends Moment<T>> Variance(Moment<T> m) {
>     // Now stuck with a generic moment. Data can be extracted with 
> getValue(int)
>     // to create a concrete internal moment. But then the input moment is not 
> used
>     // by reference. To do so requires Variance is parameterised with T as 
> well.
> }
>
> Note that all this is to support a SummaryStatistics type class that uses 
> SecondMoment to obtain a mean, standard deviation and variance. If this is 
> the extent of the usage then support for a generic hierarchy is not required.
>
> So put all any moments as package private and they can be changed in the 
> future. Basically see how far the new API in numbers can be taken before a 
> decision must be made on whether a moment hierarchy is to be part of the API. 
> At current for Mean, Variance and StdDev just a single SecondMoment 
> implementation is required.
>
>
> >
> > Regards,
> > Gilles
> >
> >> Note that the combine method is not in Commons Math and allows the
> >> classes to be used as aggregators, for example in the collect method of
> >> the Streams API.
> >>
> >> The issue that I see is that the requirement to be used as an aggregator
> >> adds a new method to combine with another instance of the same class at
> >> each level of the hierarchy. This can result for example in a
> >> SecondMoment being updated with a FirstMoment and the result is invalid.
> >>
> >> To prevent this requires exceptions to be thrown when lower moments are
> >> combined with higher moments. That is not a friendly API. The only
> >> absolute way I see to prevent this is to remove inheritance and take on
> >> the burden of code duplication at each additional level of moment.
> >> Alternatively exploiting the inheritance to remove code duplication will
> >> require careful use. This could be done with package level classes.
> >>
> >>
> >> Here is a suggestion of one way forward:
> >>
> >> Create public Mean, Variance and StandardDeviation classes. These define
> >> the public API and what you are to support.
> >>
> >> Create a package private SecondMoment class. This class can be used by
> >> both Variance and StandardDeviation to do the computation. Only the
> >> final step to compute the output statistic is different.
> >>
> >> There will be no inheritance between SecondMoment and Mean.
> >>
> >> If ThirdMoment and FourthMoment are later added then this can be done
> >> with code duplication. The use of duplication should be documented as a
> >> deliberate avoidance of inheritance to prevent combination of moments of
> >> different orders.
> >>
> >> Note that a SecondMoment can be used as the workhorse for a
> >> SummaryStatistics class. This is if the Commons Math pattern is used
> >> where the Mean and Variance can accept a SecondMoment in a constructor
> >> or have a static computation method which accepts the moment data.
> >>
> >> TBD:
> >>
> >> I do not see why the SecondMoment could not be public. If it is a
> >> standalone class free from inheritance it may be of use to someone.
> >> Here's an example if the SecondMoment exposes all the data using
> >> accessors that can then be used:
> >>
> >> double[] d = { 1.1, 2.2, 3.3 };
> >> SecondMoment m2 = Arrays.stream(d).collect(SecondMoment::new,
> >>                                            SecondMoment::accept,
> >>                                            SecondMoment::combine);
> >> double mean = m2.getM1();
> >> StandardDeviation sd = new StandardDeviation(m2);
> >> sd.setBiasCorrected(true);
> >> double stdDev = sd.getStandardDeviation();
> >>
> >> One issue with removing inheritance is how to write a SummaryStatistics
> >> class that could for example use a ThirdMoment to compute Mean,
> >> StandardDeviation and Skewness:
> >>
> >> double[] d = { 1.1, 2.2, 3.3 };
> >> ThirdMoment m3 = Arrays.stream(d).collect(ThirdMoment::new,
> >>                                            ThirdMoment::accept,
> >>                                            ThirdMoment::combine);
> >> double mean = m3.getM1();
> >> StandardDeviation sd = new StandardDeviation(m3); // INHERITANCE EXPLOITED
> >> sd.setBiasCorrected(true);
> >> double stdDev = sd.getStandardDeviation();
> >> double skew = new Skewness(m3).getSkewness();
> >>
> >> Without inheritance each could be done using static helper methods:
> >>
> >> static double getVariance(long n, double m1, double m2);
> >> static double getSkewness(long n, double m1, double m2, double m3);
> >>
> >> Note: If the SummaryStatistics is only to support the same functionality
> >> as CM SummaryStatistics then this is all possible using a SecondMoment
> >> as described above.
> >>
> >> So I seem to have created some more questions and shown part of a
> >> possible solution. Without the API for the SummaryStatistics it may not
> >> be possible to choose a solution. The question is how flexible does the
> >> API need to be with regard to reusing the moment data?
> >>
> >> Opinions on this?
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/MATH-1228
> >>
> >>>
> >>>> Variance class in StandardDeviaiton class?
> >>>> Also, should Moment classes be developed and inherited in Variance? As of
> >>>> now, I created only FirstMoment  & Variance class. If the answer for
> >>>> inheritance is NO, should we rename FirstMoment to Mean?
> >>>> Thanks
> >>>> --
> >>>> Warm Regards
> >>>> *Virendra Singh Rajpurohit*
> >>>> *University of Petroleum and Energy Studies,Dehradun*
> >>>> Linkedin:https://www.linkedin.com/in/virendra-singh-rajpurohit

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to