Re: [R] model selection with step()

Adams, Jean Fri, 06 Dec 2013 10:53:46 -0800

Karen,

Look at the help for the drop1() function.
    ?drop1


There you will see, "The hierarchy is respected when considering terms to
be added or dropped: all main effects contained in a second-order
interaction must remain, and so on."

So, for fit2, the step() function will only consider dropping a main effect
(e.g., X3) if there are no interactions involving that effect in the model.
 That's why only after X1:X3 and X2:X3 are dropped, do you see X3 being
considered for dropping in your example.

Jean



On Thu, Dec 5, 2013 at 11:27 PM, Karen Keating <keati...@ksu.edu> wrote:

> I am using the step() function to select a model using backward
> elimination, with AIC as the selection criterion.  The full regression
> model contains three predictors, plus all the second order terms and
> two-way interactions.  The full model is fit via lm() using two different
> model formulae.  One formula uses explicitly defined variables for the
> second-order and interaction terms and the other formula uses the I(x^2)
> and colon operators.  The fit generated by lm() is exactly the same for
> both models, but when I pass these fitted models to the step() function, I
> get two different results.  Apparently, step() does not recognize the three
> main predictors unless the second order and interaction terms are
> explicitly defined as separate variables.
>
> I assigned this problem to my first-year graduate students, not realizing
> that R would give two different answers.  Now I have to re-grade their
> homework, but I would really like to give them a reasonable explanation for
> the discrepancy.
>
> The complete code is given below.
>
> Could anyone shed some light on this mystery?
>
> Thanks in advance,
> Karen Keating
> Kansas State University
>
>
> # Exercise 9.13, Kutner, Nachtsheim, Neter & Li
> temp<- scan()
> 49.0   45.0   36.0   45.0
> 55.0   30.0   28.0   40.0
> 85.0   11.0   16.0   42.0
> 32.0   30.0   46.0   40.0
> 26.0   39.0   76.0   43.0
> 28.0   42.0   78.0   27.0
> 95.0   17.0   24.0   36.0
> 26.0   63.0   80.0   42.0
> 74.0   25.0   12.0   52.0
> 37.0   32.0   27.0   35.0
> 31.0   37.0   37.0   55.0
> 49.0   29.0   34.0   47.0
> 38.0   26.0   32.0   28.0
> 41.0   38.0   45.0   30.0
> 12.0   38.0   99.0   26.0
> 44.0   25.0   38.0   47.0
> 29.0   27.0   51.0   44.0
> 40.0   37.0   32.0   54.0
> 31.0   34.0   40.0   36.0
>
> dat<- matrix(temp,ncol=4,nrow=length(temp)/4,byrow=T)
> colnames(dat)<-c('Y','X1','X2','X3')
> dat <- data.frame(dat)
> attach(dat)
>
> # second order terms and interactions
> X12<-X1*X2
> X13<-X1*X3
> X23<-X2*X3
> X1sq <- X1^2
> X2sq <- X2^2
> X3sq <- X3^2
>
> fit1 <- lm(Y~ X1sq  + X2sq  + X3sq  +X1+X2+X3+ X12 + X13 + X23 )
> fit2 <- lm(Y~I(X1^2)+I(X2^2)+I(X3^2)+X1+X2+X3+X1:X2+X1:X3+X2:X3)
> sum( abs(fit1$res - fit2$res) ) # 0, so fitted models are the same
> dim(model.matrix(fit1))   # 19 x 10
> dim(model.matrix(fit2))   # 19 x 10
>
> dim(fit1$model)  # 19 x 10
> dim(fit2$model)  # 19 x 7 -- could this cause the discrepancy?
>
> back1 <- step(fit1,direction='backward')
> back2 <- step(fit2,direction='backward')
> # Note that 'back1' considers the three primary predictors X1, X2 and X3,
> # while 'back2' does not.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] model selection with step()

Reply via email to