It's a feature and it's been there forever. (It's even present in another system not unlike R.)
Suppose you set y <- matrix(1:3) and construct dfr <- data.frame(x=1:3, y) Then you invoke the constructor function, data.frame, which by default simplifies things like matrices to single columns, naming them as necessary. Now if you directly modify dfr by adding another component, like dfr$yy <- y You bypass the constructor function and its default simplifications, but you do not bypass the structure tests. This is, in fact the simplest way to put a matrix inside a data frame intact, but it must have the same number of rows as has the data frame itself. There are other ways of getting a matrix into a data frame intact, and sometimes it is mildly useful to do this. Consider, for example, the following: dfr <- within(data.frame(x = 1:5), { y <- rbinom(5, 100, plogis((x-3)/2)) SF <- cbind(S = y, F = 100-y) rm(y) }) names(dfr) ### Note the apparent discrepancy dfr ### with the printed version. (fm <- glm(SF ~ x, binomial, dfr)) Bill Venables http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daryl Morris Sent: Wednesday, 13 August 2008 11:31 AM To: r-help@r-project.org Subject: [R] issue building dataframes with matrices. Hello, Is this a bug or a feature? I am using R 2.7.1 on Apple OS X. > y <- matrix(1:3,nrow=3) # y is a single-column matrix > df <-data.frame(x=1:3,y=y) > sapply(df,data.class) x y "numeric" "numeric" > df$yy <- y > sapply(df,data.class) x y yy "numeric" "numeric" "matrix" I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors? This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error. > df$out = df$x+df$y+df$yy + rnorm(3) > df x y yy out 1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452 > glmout = glm(out~x+y+yy,data=df) > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3)) Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied > > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3))) Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied > predict(glmout,newdata=df[,-4]) 1 2 3 2.548387 6.551939 10.555491 Warning message: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here. This is also weird to me: > df$yy <- as.data.frame(y) > df x y V1 out 1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452 > glmout = glm(out~x+y+V1,data=df) Error in eval(expr, envir, enclos) : object "V1" not found > glmout = glm(out~x+y+yy,data=df) Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) : invalid type (list) for variable 'yy' > glmout = glm(out~x+y+yy$VI,data=df) Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, : invalid type (NULL) for variable 'yy$VI' Is it impossible to build a model from a dataframe built this way? thanks, Daryl Morris (Biostatistics, Univ. of Washington) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.