Karen, Look at the help for the drop1() function. ?drop1
There you will see, "The hierarchy is respected when considering terms to be added or dropped: all main effects contained in a second-order interaction must remain, and so on." So, for fit2, the step() function will only consider dropping a main effect (e.g., X3) if there are no interactions involving that effect in the model. That's why only after X1:X3 and X2:X3 are dropped, do you see X3 being considered for dropping in your example. Jean On Thu, Dec 5, 2013 at 11:27 PM, Karen Keating <keati...@ksu.edu> wrote: > I am using the step() function to select a model using backward > elimination, with AIC as the selection criterion. The full regression > model contains three predictors, plus all the second order terms and > two-way interactions. The full model is fit via lm() using two different > model formulae. One formula uses explicitly defined variables for the > second-order and interaction terms and the other formula uses the I(x^2) > and colon operators. The fit generated by lm() is exactly the same for > both models, but when I pass these fitted models to the step() function, I > get two different results. Apparently, step() does not recognize the three > main predictors unless the second order and interaction terms are > explicitly defined as separate variables. > > I assigned this problem to my first-year graduate students, not realizing > that R would give two different answers. Now I have to re-grade their > homework, but I would really like to give them a reasonable explanation for > the discrepancy. > > The complete code is given below. > > Could anyone shed some light on this mystery? > > Thanks in advance, > Karen Keating > Kansas State University > > > # Exercise 9.13, Kutner, Nachtsheim, Neter & Li > temp<- scan() > 49.0 45.0 36.0 45.0 > 55.0 30.0 28.0 40.0 > 85.0 11.0 16.0 42.0 > 32.0 30.0 46.0 40.0 > 26.0 39.0 76.0 43.0 > 28.0 42.0 78.0 27.0 > 95.0 17.0 24.0 36.0 > 26.0 63.0 80.0 42.0 > 74.0 25.0 12.0 52.0 > 37.0 32.0 27.0 35.0 > 31.0 37.0 37.0 55.0 > 49.0 29.0 34.0 47.0 > 38.0 26.0 32.0 28.0 > 41.0 38.0 45.0 30.0 > 12.0 38.0 99.0 26.0 > 44.0 25.0 38.0 47.0 > 29.0 27.0 51.0 44.0 > 40.0 37.0 32.0 54.0 > 31.0 34.0 40.0 36.0 > > dat<- matrix(temp,ncol=4,nrow=length(temp)/4,byrow=T) > colnames(dat)<-c('Y','X1','X2','X3') > dat <- data.frame(dat) > attach(dat) > > # second order terms and interactions > X12<-X1*X2 > X13<-X1*X3 > X23<-X2*X3 > X1sq <- X1^2 > X2sq <- X2^2 > X3sq <- X3^2 > > fit1 <- lm(Y~ X1sq + X2sq + X3sq +X1+X2+X3+ X12 + X13 + X23 ) > fit2 <- lm(Y~I(X1^2)+I(X2^2)+I(X3^2)+X1+X2+X3+X1:X2+X1:X3+X2:X3) > sum( abs(fit1$res - fit2$res) ) # 0, so fitted models are the same > dim(model.matrix(fit1)) # 19 x 10 > dim(model.matrix(fit2)) # 19 x 10 > > dim(fit1$model) # 19 x 10 > dim(fit2$model) # 19 x 7 -- could this cause the discrepancy? > > back1 <- step(fit1,direction='backward') > back2 <- step(fit2,direction='backward') > # Note that 'back1' considers the three primary predictors X1, X2 and X3, > # while 'back2' does not. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.