Re: [R] regsubsets() [leaps package] - please share some good examples of use

Tal Galili Thu, 12 Mar 2009 16:00:39 -0700

Thanks Thomas.
Assuming I want to change the "k" factor (used in AIC type procedures), is
there a way to do that ?
Also - is there a way to force the model to make only one "step" in the
creation of the model ?
(My aim is to be able to create an adaptive procedure, and I am looking for
a way of doing that in leaps)




Tal






On Thu, Mar 12, 2009 at 4:12 PM, Thomas Lumley <tlum...@u.washington.edu>wrote:

>
> If you run the example from ?biglm
>
> data(trees)
> ff<-log(Volume)~log(Girth)+log(Height)
> chunk1<-trees[1:10,]
> chunk2<-trees[11:20,]
> chunk3<-trees[21:31,]
> a <- biglm(ff,chunk1)
> a <- update(a,chunk2)
> a <- update(a,chunk3)
> summary(a)
>
> you can then do
>
> b <-regsubsets(a, method="forward")
> summary(b)
>
> to get the results of forward selection.  In general, the biglm fit is the
> `maximum' model for the forward selection: all the variables that you want
> to consider for inclusion.
>
>     -thomas
>
>
>
> On Thu, 12 Mar 2009, Tal Galili wrote:
>
>  Hello dear R-help members,
>>
>> I recently became interested in using biglm with leaps, and found myself
>> somewhat confused as to how to use the two together, in different
>> settings.
>>
>> I couldn't find any example codes for the leaps() package (except for in
>> the
>> help file, and the examples there are not as rich as they could be).  That
>> is why I turn to you in case you could share some good tips and examples
>> of
>> code on how to use the leaps package (especially the regsubsets command)
>>
>>
>> The problem that drives me to ask this is: how to use the regsubsets()
>> command to immulate a forward model selection procedure on a regressions
>> problem ?
>>
>> I attach below a few direction dear Thomas has already wrote to me on the
>> subject, and any help would be very welcomed:
>>
>> *me:*
>> I feel I am missing a big something here, so please help me here -
>> Let's say we have a dataset with an X matrix of 10 variables, and all we
>> want to perform is forward variable selection with AIC, starting from
>> the minimal model that includes the intercept only, and with the maximum
>> model of all variable and their interaction up to the second order.
>> In that range, we wish to find the best model, based on forward selection.
>>
>> *Thomas:*
>> Use biglm() to fit the model with all main effects and all second order
>> interactions.  This model will be the maximum model for selection.
>>
>> The minimum model, by default, is the model with only an intercept, so you
>> don't need to specify anything.  If the minimum model is more complicated,
>> the vector force.in specifies which terms are in the minimum model (a
>> logical vector with TRUE for variables in the minimum model and FALSE for
>> variables not in the minimum model).
>>
>> regsubsets() will give you the best model with one variable, the best with
>> two variables, and so on. The object produced by summary() of the
>> regsubsets() has a component $cp that gives Mallows' Cp for each of the
>> best
>> models. This is equivalent to AIC, or you can compute AIC from the
>> residual
>> sum of squares in the $rss component of the object.
>>
>> regsubsets() doesn't actually fit the models, it just works out the
>> residual
>> sum of squares. You need to take the output of regsubsets() and then fit
>> which ever of the best models you want coefficients for.
>> summary(regsubsets.object)$which is a logical matrix indicating which
>> variables are in each of the best models.
>> This may seem unnecessarily complicated, but regsubsets() was designed for
>> situations where you want lots of best models rather than just one, since
>> there are often lots of models that are about equally good.  That's the
>> point of the
>> plot() method, where you can look at hundreds of best models from 30 or so
>> variables and see which variables are in most of the good models, and
>> which
>> variables tend to occur together or separately -- for example, if you have
>> two related variables such as systolic blood pressure and diastolic blood
>> pressure do they substitute for each other or do they tend to occur in the
>> same model.
>>
>>
>>
>> Thanks all (and again - thanks Thomas for all your patient answers so far)
>> Tal
>>
>>
>>
>> p.s: I already sent this e-mail once, but couldn't seem to see it on the
>> list, so I resent it again - sorry if any of you got it twice.
>>
>>
>>
>> ----------------------------------------------
>>
>>
>> My contact information:
>> Tal Galili
>> Phone number: 972-50-3373767
>> FaceBook: Tal Galili
>> My Blogs:
>> www.talgalili.com
>> www.biostatistics.co.il
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> Thomas Lumley                   Assoc. Professor, Biostatistics
> tlum...@u.washington.edu        University of Washington, Seattle
>
>
>


-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regsubsets() [leaps package] - please share some good examples of use

Reply via email to