Hi Giles, I will start exproling the links you gave me. I would suggest Logistic/probit regression to go under the regerssion package. Not that clustering is really any different, but it makes sense to find logistic "regerssion" in the a package named as such. Regards Marios
> Date: Fri, 7 Sep 2012 17:48:12 +0200 > From: gil...@harfang.homelinux.org > To: dev@commons.apache.org > Subject: Re: [math] Logistic, Probit regerssion and Tolerance checks > > Hi. > > > > > My name is Marios and I have very good > > academic background as well as I have worked as modeling analyst in big > > projects thus I have experience with prediction and optimization algorithms. > > > > Welcome to Commons Math's forum. > > > > > Recently (before 5 months) , I started > > learning JAVA and I have made my life much more simple by using Java and > > Common > > math rather than depending on the common packages (SAS SPSS etc). > > Obviously, I > > owe common math a lot. > > That's good to read. > > > > > I have noticed that the site does not > > have logistic regression and probit regression, very commonly used in > > classification problems. Additionally, The math package does not provide a > > way > > to assess Tolerance (or VIF), very commonly used to avoid multi-colinearity > > issues and singular matrices in optimization algorithms, prior to running > > them. > > > > > > > > I am willing to provide complete > > Logistic and Probit regression algorithms, optimizable by newton Raphson > > optimization maximum-likelihood method , in a very programmatically easy way > > (e.g regression(double matrix [][], double Target[], String > > Constant, double precision, double tolerance) , with academic references and > > very quick (3 secs for 60k set), with getter methods for all the common > > statistics such as null Deviance, Deviance, AIC, BIC, Chi-square f the > > model, > > betas, Wald statistics and p values, Cox_snell R square, Nagelkerke’s > > R-Square, > > Pseudo_r2, residuals, probabilities, classification matrix. > > Such contributions would certainly be most welcome. > > But care must be taken in how to fit those features into Commons Math. I mean > that the new implementations should be integrated in the API of similar > functionalities, if they currently exist. > > IIUC, the proposal could be related to code currently in package > org.apache.commons.math3.stat.clustering > and/or to the pending improvements suggested in this report: > https://issues.apache.org/jira/browse/MATH-748 > > [By the way, I wonder whether "clustering" should really be under "stat", > rather than, say, "optimization" or a package of its own, one level up.] > > In any case, it might be worth discussing here some design issues, before you > start adapting your code. At the same time, you should open tickets on the > bug tracking system: > https://issues.apache.org/jira/browse/MATH > Preferably, there should be a general request for "New feature"; then > several "sub-issues" could be linked to that one, each referring to a > specific task (typically a class, with its unit tests). > > > I have also included steps for checking > > tolerance so that we avoid cases that fail to converge. Generally the > > algorithm > > is not very expensive for the RAM (because I have approximated the Hessian > > Matrix) and the only external jar that I use is common math for > > multiplications > > of matrices. > > Although the performance issue is certainly important, it is an > "implementation detail" that should not preempt a clear API (i.e. one that > reflects the mathematical concepts) and the reuse of existing classes (those > can be improved at the same time, if your proposal reveals that something is > lacking). > > > Thanks for your interest, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org >