Thank you Gavin! I am aware of the lasso regularization routine. But, in this case, my brief was to perform a stepwise AIC procedure. I guess, subsetting the data and cross validating it over the rest of the data is the only answer.
sambit On 20 August 2010 14:43, Gavin Simpson <gavin.simp...@ucl.ac.uk> wrote: > On Fri, 2010-08-20 at 12:25 +0530, sambit rath wrote: >> Dear all, >> >> I am fairly new to R. I would like to perform a step-wise logit >> regression aiming to select a model on the basis of AIC. I am using >> some large datasets (up to a million rows and 97 variables). It is >> taking the 'step' function just too long to complete a single >> routine. Now, I have tried subsetting the data and perform the same >> thing. But, 'step' is time consuming still. >> Can there be a way out? > > Rethink your model selection procedure. Look at ridge regression and the > lasso and elastic net procedures (See the Machine Learning task view on > CRAN: http://CRAN.R-project.org/view=MachineLearning ) > > Do you need all million rows? What do they gain you over using a > smaller, randomly selected subset? You model fitted to the subset can be > confirmed against the cases omitted from fitting. > >> Also, the datasets I am working with contain very few non-zero >> entries. Can a sparse function specification be used on step? > > I don't think this is possible at the moment in R, but several people, > including Doug Bates and Martin Maechler, are working on bringing sparse > model matrices and fitting code into R. Doug and Martin's efforts are in > the unreleased MatrixModels package on R-Forge: > > https://r-forge.r-project.org/R/?group_id=61 > > but it is in active development at a beta stage and doesn't contain any > stepwise procedures either. The latter isn't a problem as you probably > want to use a shrinkage method as mentioned above... > > HTH > > G > >> >> Thank you. >> >> Sambit >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.