Thanks Thomas. Assuming I want to change the "k" factor (used in AIC type procedures), is there a way to do that ? Also - is there a way to force the model to make only one "step" in the creation of the model ? (My aim is to be able to create an adaptive procedure, and I am looking for a way of doing that in leaps)
Tal On Thu, Mar 12, 2009 at 4:12 PM, Thomas Lumley <tlum...@u.washington.edu>wrote: > > If you run the example from ?biglm > > data(trees) > ff<-log(Volume)~log(Girth)+log(Height) > chunk1<-trees[1:10,] > chunk2<-trees[11:20,] > chunk3<-trees[21:31,] > a <- biglm(ff,chunk1) > a <- update(a,chunk2) > a <- update(a,chunk3) > summary(a) > > you can then do > > b <-regsubsets(a, method="forward") > summary(b) > > to get the results of forward selection. In general, the biglm fit is the > `maximum' model for the forward selection: all the variables that you want > to consider for inclusion. > > -thomas > > > > On Thu, 12 Mar 2009, Tal Galili wrote: > > Hello dear R-help members, >> >> I recently became interested in using biglm with leaps, and found myself >> somewhat confused as to how to use the two together, in different >> settings. >> >> I couldn't find any example codes for the leaps() package (except for in >> the >> help file, and the examples there are not as rich as they could be). That >> is why I turn to you in case you could share some good tips and examples >> of >> code on how to use the leaps package (especially the regsubsets command) >> >> >> The problem that drives me to ask this is: how to use the regsubsets() >> command to immulate a forward model selection procedure on a regressions >> problem ? >> >> I attach below a few direction dear Thomas has already wrote to me on the >> subject, and any help would be very welcomed: >> >> *me:* >> I feel I am missing a big something here, so please help me here - >> Let's say we have a dataset with an X matrix of 10 variables, and all we >> want to perform is forward variable selection with AIC, starting from >> the minimal model that includes the intercept only, and with the maximum >> model of all variable and their interaction up to the second order. >> In that range, we wish to find the best model, based on forward selection. >> >> *Thomas:* >> Use biglm() to fit the model with all main effects and all second order >> interactions. This model will be the maximum model for selection. >> >> The minimum model, by default, is the model with only an intercept, so you >> don't need to specify anything. If the minimum model is more complicated, >> the vector force.in specifies which terms are in the minimum model (a >> logical vector with TRUE for variables in the minimum model and FALSE for >> variables not in the minimum model). >> >> regsubsets() will give you the best model with one variable, the best with >> two variables, and so on. The object produced by summary() of the >> regsubsets() has a component $cp that gives Mallows' Cp for each of the >> best >> models. This is equivalent to AIC, or you can compute AIC from the >> residual >> sum of squares in the $rss component of the object. >> >> regsubsets() doesn't actually fit the models, it just works out the >> residual >> sum of squares. You need to take the output of regsubsets() and then fit >> which ever of the best models you want coefficients for. >> summary(regsubsets.object)$which is a logical matrix indicating which >> variables are in each of the best models. >> This may seem unnecessarily complicated, but regsubsets() was designed for >> situations where you want lots of best models rather than just one, since >> there are often lots of models that are about equally good. That's the >> point of the >> plot() method, where you can look at hundreds of best models from 30 or so >> variables and see which variables are in most of the good models, and >> which >> variables tend to occur together or separately -- for example, if you have >> two related variables such as systolic blood pressure and diastolic blood >> pressure do they substitute for each other or do they tend to occur in the >> same model. >> >> >> >> Thanks all (and again - thanks Thomas for all your patient answers so far) >> Tal >> >> >> >> p.s: I already sent this e-mail once, but couldn't seem to see it on the >> list, so I resent it again - sorry if any of you got it twice. >> >> >> >> ---------------------------------------------- >> >> >> My contact information: >> Tal Galili >> Phone number: 972-50-3373767 >> FaceBook: Tal Galili >> My Blogs: >> www.talgalili.com >> www.biostatistics.co.il >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > Thomas Lumley Assoc. Professor, Biostatistics > tlum...@u.washington.edu University of Washington, Seattle > > > -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.