On May 12, 2012, at 04:40 , Michael wrote: > Hi all, > > Could you please help me? > > I am trying to understand why this line works: > > lm1x = lm(y~X-1, tmp) > > Here it seems that I was combining the design matrix and the data frame... > > And X below is not a single column, in fact, it's a bunch of columns in > matrix form... > > I don't understand why this line works...
It works because you can have a matrix on the right hand side of a model formula and it will be interpreted as a set of columns... If X is constructed via model.matrix it usually contains an all-1 column, which is why you need to remove the intercept with "-1". A data frame can contain matrices, you just need to be careful that some functions, notably data.frame(), which will split a matrix into its constituent columns, unless protected with I(). Notice the difference between these two examples: > d <- data.frame(a=1:3,m=matrix(4:9,,2)) > d$m NULL > names(d) [1] "a" "m.1" "m.2" > d <- data.frame(a=1:3,m=I(matrix(4:9,,2))) > d$m [,1] [,2] [1,] 4 7 [2,] 5 8 [3,] 6 9 > names(d) [1] "a" "m" (d <- data.frame(a=1:3) ; d$m <- matrix(4:9,,2) is like the 2nd version) Your example has the weakness that you do "tmp$X <- X" so that you have X both in the data frame tmp and in the global workspace, and it is really not clear that the one in the data frame is the one used. I am pretty sure that this is in fact the case, but for your peace of mind, you should try renaming one of them. > > Is it just luck, i.e. if we change the data-set and/or formulas to > something else, this will potentially fail? > (that's something I would like to catch and avoid...) > > Thank you! > > ------------------------------- > > The data is located at: > http://www.ling.uni-potsdam.de/~vasishth/book.html > In the section: > downloadable errata, code, datasets > ERRATA > Corrected Pages > VasishthBroebook.R > beauty.txt > mathachieve.txt > mathachschool.txt > > ------------------------------ > > MathAchieve <- read.table("mathachieve.txt") > > colnames(MathAchieve) <- c("School", "Minority", "Sex", "SES", "MathAch", > "MEANSES") > > head(MathAchieve) > > > MathAchSchool <- read.table("mathachschool.txt") > colnames(MathAchSchool) <- c("School", "Size", "Sector", "PRACAD", > "DISCLIM", "HIMINTY", "MEANSES") > MathScores <- merge(MathAchieve, MathAchSchool, by = "School") > > > lm1 = lm(MathAch ~ SES + factor(Sector) , MathScores) > > X=model.matrix(MathAch ~ SES+factor(Sector) , MathScores) > y=MathScores$MathAch > > tmp=MathScores > tmp$y=y > tmp$X=X > > lm1x = lm(y~X-1, tmp) > > plot(fitted(lm1), fitted(lm1x)) > > max(abs(fitted(lm1) - fitted(lm1x))) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.