On Fri, 2010-08-20 at 12:25 +0530, sambit rath wrote: > Dear all, > > I am fairly new to R. I would like to perform a step-wise logit > regression aiming to select a model on the basis of AIC. I am using > some large datasets (up to a million rows and 97 variables). It is > taking the 'step' function just too long to complete a single > routine. Now, I have tried subsetting the data and perform the same > thing. But, 'step' is time consuming still. > Can there be a way out?
Rethink your model selection procedure. Look at ridge regression and the lasso and elastic net procedures (See the Machine Learning task view on CRAN: http://CRAN.R-project.org/view=MachineLearning ) Do you need all million rows? What do they gain you over using a smaller, randomly selected subset? You model fitted to the subset can be confirmed against the cases omitted from fitting. > Also, the datasets I am working with contain very few non-zero > entries. Can a sparse function specification be used on step? I don't think this is possible at the moment in R, but several people, including Doug Bates and Martin Maechler, are working on bringing sparse model matrices and fitting code into R. Doug and Martin's efforts are in the unreleased MatrixModels package on R-Forge: https://r-forge.r-project.org/R/?group_id=61 but it is in active development at a beta stage and doesn't contain any stepwise procedures either. The latter isn't a problem as you probably want to use a shrinkage method as mentioned above... HTH G > > Thank you. > > Sambit > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.