on 01/13/2009 09:32 AM Stefano Leonardi wrote: > Hallo, > yesterday I was puzzled when I discovered that I > probabliy miss something in the interepretation of intercept > in two-way lm models. > > I thought that the intercept, using the default contr.treatment > contrasts, represents the mean of the group of observations > having zero in all column of the model.matrix. > It turns out not to be case > > To be more more clear I am attaching a short example: > > R>#this is to generate the dataset > R>set.seed(1) #to have the same results I have > R>B <-rep(c("a","b","c"),c(6,6,6)) > R>nB <- 1:3 > R>names(nB)<-c("a","b","c") > R>B <- as.factor(B) > R>A<-rep(rep(c("0","1"),c(3,3)),3) > R>A <- as.factor(A) > R>Y <- 5+ 10*as.numeric(as.character(A)) + 20*nB[as.character(B)] > + rnorm(18,0,5) > R>#this is the generated dataset > R>data.frame(Y,A,B) > Y A B > 1 21.86773 0 a > 2 25.91822 0 a > 3 20.82186 0 a > 4 42.97640 1 a > 5 36.64754 1 a > 6 30.89766 1 a > 7 47.43715 0 b > 8 48.69162 0 b > 9 47.87891 0 b > 10 53.47306 1 b > 11 62.55891 1 b > 12 56.94922 1 b > 13 61.89380 0 c > 14 53.92650 0 c > 15 70.62465 0 c > 16 74.77533 1 c > 17 74.91905 1 c > 18 79.71918 1 c > > R>#table with means > R>M<-tapply(Y,list(A,B),mean) > > R>print(M) > a b c > 0 22.86927 48.00256 62.14832 > 1 36.84053 57.66039 76.47119 > > R>lm(Y ~ A + B) > > Call: > lm(formula = Y ~ A + B) > > Coefficients: > (Intercept) A1 Bb Bc > 23.53 12.65 22.98 39.45 > > > I was expecting that the intercept in the lm output would be equal > to the top left corner (0-a) of my previous M table (22.86927) > which is the mean of the first three observations in the > dataset. > > So how do I interpret the intercept 23.53? > > I could not relate it to any other mean. > > R>mean(Y[A=="0" & B=="a"]) > [1] 22.86927 > R>apply(M,1,mean) > 0 1 > 44.34005 56.99070 > R>apply(M,2,mean) > a b c > 29.85490 52.83148 69.30975 > > > The following of course gave me the same results as the lm call > > R>X<- model.matrix(~A+B) > R>b <- solve(t(X) %*% X) %*% t(X) %*% Y > R>b > [,1] > (Intercept) 23.52957 > A1 12.65066 > Bb 22.97658 > Bc 39.45485 > > > Thank you in advance for any suggestion. > > Stefano
You need to look at the means of the fitted data, not the means of the raw data. Thus, using 'DF' as the source data frame: > LM Call: lm(formula = Y ~ A + B, data = DF) Coefficients: (Intercept) A Bb Bc 23.53 12.65 22.98 39.45 # Get the model fitted Y values > fitted(LM) 1 2 3 4 5 6 7 8 23.52957 23.52957 23.52957 36.18023 36.18023 36.18023 46.50615 46.50615 9 10 11 12 13 14 15 16 46.50615 59.15681 59.15681 59.15681 62.98442 62.98442 62.98442 75.63508 17 18 75.63508 75.63508 # cbind them DF.fitted <- cbind(DF, F.lm = fitted(LM)) > DF.fitted Y A B F.lm 1 21.86773 0 a 23.52957 2 25.91822 0 a 23.52957 3 20.82186 0 a 23.52957 4 42.97640 1 a 36.18023 5 36.64754 1 a 36.18023 6 30.89766 1 a 36.18023 7 47.43715 0 b 46.50615 8 48.69162 0 b 46.50615 9 47.87891 0 b 46.50615 10 53.47306 1 b 59.15681 11 62.55891 1 b 59.15681 12 56.94922 1 b 59.15681 13 61.89380 0 c 62.98442 14 53.92650 0 c 62.98442 15 70.62465 0 c 62.98442 16 74.77533 1 c 75.63508 17 74.91905 1 c 75.63508 18 79.71918 1 c 75.63508 # Now get the means of the fitted values across # the combinations of A and B M <- with(DF.fitted, tapply(F.lm, list(A = A, B = B), mean)) > M B A a b c 0 23.52957 46.50615 62.98442 1 36.18023 59.15681 75.63508 Thus: # Intercept = *fitted* mean at A = 0; B = "a" > M["0", "a"] [1] 23.52957 # A > M["1", "a"] - M["0", "a"] [1] 12.65066 # Bb > M["1", "b"] - M["1", "a"] [1] 22.97658 # Bc > M["1", "c"] - M["1", "a"] [1] 39.45485 Alternatively, for the intercept: > predict(LM, newdata = data.frame(A = 0, B = "a")) 1 23.52957 See ?fitted and ?predict.lm HTH, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.