Le lun. 1 juil. 2019 à 22:47, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > Re-sent to list. > > > On 1 Jul 2019, at 18:49, Gilles Sadowski <gillese...@gmail.com> wrote: > > > > Le lun. 1 juil. 2019 à 19:10, Alex Herbert <alex.d.herb...@gmail.com > > <mailto:alex.d.herb...@gmail.com>> a écrit : > >> > >> On 01/07/2019 08:59, Gilles Sadowski wrote: > >>> Hello. > >>> > >>> Le lun. 1 juil. 2019 à 00:50, Virendra singh Rajpurohit > >>> <virendrasing...@gmail.com> a écrit : > >>>> Hi everyone, > >>>> I've been working on descriptive module of "Commons-Statistics". My > >>>> question is should I extend > >>> I'll give hints as if the question mark was after the word "extend". > >>> > >>> For starter, please read this: > >>> > >>> https://books.google.fr/books?id=ka2VUBqHiWkC&pg=PA81&lpg=PA81&dq=java+inheritance+bloch&source=bl&ots=y_IgRku_OX&sig=ACfU3U3VMmrHT2ILGJQluunGLJcEOMICBA&hl=fr&sa=X&ved=2ahUKEwicsvv6nJPjAhVwx4UKHdU6BFUQ6AEwBXoECAkQAQ#v=onepage&q=java%20inheritance%20bloch&f=false > >>> (and the whole book is advice to follow). > >>> > >>> Then, there might be some useful information here: > >>> > >>> https://stackoverflow.com/questions/218744/good-reasons-to-prohibit-inheritance-in-java > >>> > >>> Regards, > >>> Gilles > >> > >> This issue arises because the algorithms for the mean and the variance > >> share computation and the standard deviation is derived from the variance. > >> > >> They can be combined using inheritance. The reference above indicates > >> that this is poor practice unless you specifically design for it. > >> Splitting them all apart as separate classes will result in code > >> duplication. This is also considered poor practice without a good > >> reason. However there is one good reason in this case (as I will show > >> later). > > > > Avoiding duplication is a corollary of inheritance, but not the condition > > for using it. > > There must an "is-a" relationship. > > > >> The solution in Commons Math was to have a package-private set of moment > >> classes that have an inheritance hierarchy. The public facing classes > >> can then wrap these internal classes and avoid inheritance. However in > >> Commons math some of the public constructors depended on package private > >> moment classes, and one moment class was public. So the API was flawed. > >> This is the subject of an issue which discusses removing the moment > >> classes and avoiding inheritance [1]. The outcome of this discussion > >> (from 2016) is not in the Jira ticket. > >> > >> There is also the question of whether the third and fourth central > >> moments should be ported from Commons Math with the associated Skewness > >> and Kurtosis statistics. It should be noted that at the limit of this > >> hierarchy the fourth central moment has the data to compute Mean, > >> Variance+Standard Deviation, Skewness and Kurtosis. > >> > >> Although it is possible to compute many statistics from a moment I would > >> say it can be assumed if the user has selected a specific statistic > >> class then they are only interested in 1 statistic. Construction of a > >> flexible SummaryStatistics class to provide multiple statistics should > >> be part of the design and so objects to compute different moments and > >> the ability to use them to compute different statistics should be part > >> of the design. This can be addressed separately but should be kept in mind. > >> > >> If inheritance is eliminated at the jump from 1st to 2nd moment it will > >> result in a small amount of code duplication.: > >> > >> Moment1 = 4 lines > >> Moment2 = 1 extra line (5 total) > >> Moment3 = 4 extra lines (9 total) > >> Moment4 = 4 extra lines (13 total) > >> > >> However all protected variables used in the hierarchy can be local to > >> the method and the code (although duplicated) may be easier to follow. > >> > >> If the requirements of a moment are specified in an interface it can be > >> seen that the hierarchy is simple: > >> > >> interface FirstMoment { > >> long getN(); > >> double getM1(); > >> void accept(double d); > >> void combine(FirstMoment m1); > >> } > >> > >> interface SecondMoment extends FirstMoment { > >> double getM2(); > >> void combine(SecondMoment m2); > >> } > >> > >> interface ThirdMoment extends SecondMoment { > >> double getM3(); > >> void combine(ThirdMoment m3); > >> } > > > > "is-a" condition is not met. > > Do you mean that ThirdMoment “is-a” SecondMoment is not true? While this is > correct the way the algorithm is implemented means that a ThirdMoment does > have all the data for the lower order moments. > > > > > Another direction to explore: > > > > interface Moment<T> { > > long getN(); > > double getValue(); > > void accept(double); > > void combine(Moment<T> m); > > } > > And <T> is an enum?
It's not what I had in mind, but I didn't pursue it because I thought that testing different approaches was Virendra's task... As you point out much below, first step is to lay out the minimally supported API. Gilles > I do not see how this helps this: > > ThirdMoment m3 = … > Variance var = new Variance(m3); > > If ThirdMoment is to be used generically it needs a method to obtain the > value for all moments from 3 to 1. The single ‘getValue()’ is not enough. It > would require: > > interface Moment<T extends MomentOrder> { > long getN(); > double getValue(M extends MomentOrder); > void accept(double); > void combine(Moment<T> m); > } > > Where M is <= T in the enum ranking. So basically it is an integer and you > can restrict the combine method using self-types: > > interface Moment<T extends Moment<T>> { > long getN(); > double getValue(int order); > void accept(double); > void combine(T m); > } > > With this you have a contract that a Moment can only be combined with > instances of the same class. But you do not have a restriction on what orders > you will support. So for example the Variance constructor is: > > <T extends Moment<T>> Variance(Moment<T> m) { > // Now stuck with a generic moment. Data can be extracted with > getValue(int) > // to create a concrete internal moment. But then the input moment is not > used > // by reference. To do so requires Variance is parameterised with T as > well. > } > > Note that all this is to support a SummaryStatistics type class that uses > SecondMoment to obtain a mean, standard deviation and variance. If this is > the extent of the usage then support for a generic hierarchy is not required. > > So put all any moments as package private and they can be changed in the > future. Basically see how far the new API in numbers can be taken before a > decision must be made on whether a moment hierarchy is to be part of the API. > At current for Mean, Variance and StdDev just a single SecondMoment > implementation is required. > > > > > > Regards, > > Gilles > > > >> Note that the combine method is not in Commons Math and allows the > >> classes to be used as aggregators, for example in the collect method of > >> the Streams API. > >> > >> The issue that I see is that the requirement to be used as an aggregator > >> adds a new method to combine with another instance of the same class at > >> each level of the hierarchy. This can result for example in a > >> SecondMoment being updated with a FirstMoment and the result is invalid. > >> > >> To prevent this requires exceptions to be thrown when lower moments are > >> combined with higher moments. That is not a friendly API. The only > >> absolute way I see to prevent this is to remove inheritance and take on > >> the burden of code duplication at each additional level of moment. > >> Alternatively exploiting the inheritance to remove code duplication will > >> require careful use. This could be done with package level classes. > >> > >> > >> Here is a suggestion of one way forward: > >> > >> Create public Mean, Variance and StandardDeviation classes. These define > >> the public API and what you are to support. > >> > >> Create a package private SecondMoment class. This class can be used by > >> both Variance and StandardDeviation to do the computation. Only the > >> final step to compute the output statistic is different. > >> > >> There will be no inheritance between SecondMoment and Mean. > >> > >> If ThirdMoment and FourthMoment are later added then this can be done > >> with code duplication. The use of duplication should be documented as a > >> deliberate avoidance of inheritance to prevent combination of moments of > >> different orders. > >> > >> Note that a SecondMoment can be used as the workhorse for a > >> SummaryStatistics class. This is if the Commons Math pattern is used > >> where the Mean and Variance can accept a SecondMoment in a constructor > >> or have a static computation method which accepts the moment data. > >> > >> TBD: > >> > >> I do not see why the SecondMoment could not be public. If it is a > >> standalone class free from inheritance it may be of use to someone. > >> Here's an example if the SecondMoment exposes all the data using > >> accessors that can then be used: > >> > >> double[] d = { 1.1, 2.2, 3.3 }; > >> SecondMoment m2 = Arrays.stream(d).collect(SecondMoment::new, > >> SecondMoment::accept, > >> SecondMoment::combine); > >> double mean = m2.getM1(); > >> StandardDeviation sd = new StandardDeviation(m2); > >> sd.setBiasCorrected(true); > >> double stdDev = sd.getStandardDeviation(); > >> > >> One issue with removing inheritance is how to write a SummaryStatistics > >> class that could for example use a ThirdMoment to compute Mean, > >> StandardDeviation and Skewness: > >> > >> double[] d = { 1.1, 2.2, 3.3 }; > >> ThirdMoment m3 = Arrays.stream(d).collect(ThirdMoment::new, > >> ThirdMoment::accept, > >> ThirdMoment::combine); > >> double mean = m3.getM1(); > >> StandardDeviation sd = new StandardDeviation(m3); // INHERITANCE EXPLOITED > >> sd.setBiasCorrected(true); > >> double stdDev = sd.getStandardDeviation(); > >> double skew = new Skewness(m3).getSkewness(); > >> > >> Without inheritance each could be done using static helper methods: > >> > >> static double getVariance(long n, double m1, double m2); > >> static double getSkewness(long n, double m1, double m2, double m3); > >> > >> Note: If the SummaryStatistics is only to support the same functionality > >> as CM SummaryStatistics then this is all possible using a SecondMoment > >> as described above. > >> > >> So I seem to have created some more questions and shown part of a > >> possible solution. Without the API for the SummaryStatistics it may not > >> be possible to choose a solution. The question is how flexible does the > >> API need to be with regard to reusing the moment data? > >> > >> Opinions on this? > >> > >> > >> [1] https://issues.apache.org/jira/browse/MATH-1228 > >> > >>> > >>>> Variance class in StandardDeviaiton class? > >>>> Also, should Moment classes be developed and inherited in Variance? As of > >>>> now, I created only FirstMoment & Variance class. If the answer for > >>>> inheritance is NO, should we rename FirstMoment to Mean? > >>>> Thanks > >>>> -- > >>>> Warm Regards > >>>> *Virendra Singh Rajpurohit* > >>>> *University of Petroleum and Energy Studies,Dehradun* > >>>> Linkedin:https://www.linkedin.com/in/virendra-singh-rajpurohit --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org