Hi Giles,
I will start exproling the links you gave me.
I would suggest Logistic/probit regression to go under the regerssion package. 
Not that clustering is really any different, but it makes sense to find 
logistic "regerssion" in the a package named as such.
Regards
Marios

> Date: Fri, 7 Sep 2012 17:48:12 +0200
> From: gil...@harfang.homelinux.org
> To: dev@commons.apache.org
> Subject: Re: [math] Logistic, Probit regerssion and Tolerance checks
> 
> Hi.
> 
> > 
> > My name is Marios and I have very good
> > academic background as well as I have worked as modeling analyst in big
> > projects thus I have experience with prediction and optimization algorithms.
> > 
> 
> Welcome to Commons Math's forum.
>  
> > 
> > Recently (before 5 months) , I started
> > learning JAVA and I have made my life much more simple by using Java and 
> > Common
> > math rather than depending on the common packages (SAS SPSS etc). 
> > Obviously, I
> > owe common math a lot.
> 
> That's good to read.
>  
> > 
> > I have noticed that the site does not
> > have logistic regression and probit regression, very commonly used in
> > classification problems. Additionally, The math package does not provide a 
> > way
> > to assess Tolerance (or VIF), very commonly used to avoid multi-colinearity
> > issues and singular matrices in optimization algorithms, prior to running 
> > them.
> > 
> >  
> > 
> > I am willing to provide complete
> > Logistic and Probit regression algorithms, optimizable by newton Raphson
> > optimization maximum-likelihood method , in a very programmatically easy way
> > (e.g  regression(double matrix [][],  double Target[], String
> > Constant, double precision, double tolerance) , with academic references and
> > very quick (3 secs for 60k set), with getter methods for all the common
> > statistics such as null Deviance, Deviance, AIC, BIC, Chi-square f the 
> > model,
> > betas, Wald statistics and p values, Cox_snell R square, Nagelkerke’s 
> > R-Square,
> > Pseudo_r2, residuals, probabilities, classification matrix.
> 
> Such contributions would certainly be most welcome.
> 
> But care must be taken in how to fit those features into Commons Math. I mean
> that the new implementations should be integrated in the API of similar
> functionalities, if they currently exist.
> 
> IIUC, the proposal could be related to code currently in package
>   org.apache.commons.math3.stat.clustering
> and/or to the pending improvements suggested in this report:
>   https://issues.apache.org/jira/browse/MATH-748
> 
> [By the way, I wonder whether "clustering" should really be under "stat",
> rather than, say, "optimization" or a package of its own, one level up.]
> 
> In any case, it might be worth discussing here some design issues, before you
> start adapting your code. At the same time, you should open tickets on the
> bug tracking system:
>   https://issues.apache.org/jira/browse/MATH
> Preferably, there should be a general request for "New feature"; then
> several "sub-issues" could be linked to that one, each referring to a
> specific task (typically a class, with its unit tests).
> 
> > I have also included steps for checking
> > tolerance so that we avoid cases that fail to converge. Generally the 
> > algorithm
> > is not very expensive for the RAM (because I have approximated the Hessian
> > Matrix) and the only external jar that I use is common math for 
> > multiplications
> > of matrices.
> 
> Although the performance issue is certainly important, it is an
> "implementation detail" that should not preempt a clear API (i.e. one that
> reflects the mathematical concepts) and the reuse of existing classes (those
> can be improved at the same time, if your proposal reveals that something is
> lacking).
> 
> 
> Thanks for your interest,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
                                          

Reply via email to