> When read into a data.frame, R defaults to reading character strings as > factors. If you don't want that, use option stringsAsFactors = FALSE.
This is somewhat tangential, but if you plan on using predict(fit,newdata=nd) after fitting a model like fit <- lm(y~x, data=d) be sure you have converted character columns in nd and d into factors. Otherwise you are likely to get errors from predict(). You will get a warning when fitting the model if you use character columns, but the results are ok until you use predict() on the result. E.g., > d <- data.frame(y=1:10, cGroup=rep(c("A","B","C"),c(3,4,3)), > fGroup=factor(rep(c("A","B","C"),c(3,4,3))), stringsAsFactors=FALSE) > fitChar <- lm(y ~ cGroup - 1, data=d[1:9,]) Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'cGroup' converted to a factor > fitFactor <- lm(y ~ fGroup - 1, data=d[1:9,]) > coef(fitChar) cGroupA cGroupB cGroupC 2.0 5.5 8.5 > coef(fitFactor) fGroupA fGroupB fGroupC 2.0 5.5 8.5 > # so far things are ok, but ... > predict(fitChar, newdata=d[10,]) Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) : variable 'cGroup' converted to a factor > predict(fitFactor, newdata=d[10,]) 10 8.5 > predict(fitChar, newdata=d[c(1,10),]) Error in predict.lm(fitChar, newdata = d[c(1, 10), ]) : subscript out of bounds In addition: Warning message: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) : variable 'cGroup' converted to a factor > predict(fitFactor, newdata=d[c(1,10),]) 1 10 2.0 8.5 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Rui Barradas > Sent: Tuesday, October 23, 2012 11:16 AM > To: asafwe > Cc: r-help@r-project.org > Subject: Re: [R] Data type in a data frame > > Hello, > > When read into a data.frame, R defaults to reading character strings as > factors. If you don't want that, use option stringsAsFactors = FALSE. > Using your dataset, > > > dat1 <- read.table(text = " > Observation Gender Dosage Alertness > 1 m a 8 > 2 m a 12 > 3 m a 13 > 4 m a 12 > 5 m b 6 > 6 m b 7 > ", header = TRUE) > str(dat2) > > dat2 <- read.table(text = " > Observation Gender Dosage Alertness > 1 m a 8 > 2 m a 12 > 3 m a 13 > 4 m a 12 > 5 m b 6 > 6 m b 7 > ", header = TRUE, stringsAsFactors = FALSE) > str(dat2) > > > This is decided based on the setting of (which you can change) > > options("stringsAsFactors") > > Hope this helps, > > Rui Barradas > Em 23-10-2012 15:43, asafwe escreveu: > > Hi all, > > > > How does R know to regard a variable as a factor and not a character? > > For example, consider the following table: > > > > Observation Gender Dosage > > Alertness > > 1 m a > > 8 > > 2 m a > > 12 > > 3 m a > > 13 > > 4 m a > > 12 > > 5 m b > > 6 > > 6 m b > > 7 > > > > When read into a dataframe, will "m", "a", "b" be regarded as a factor or as > > a character? How does R decide? > > > > Thanks a lot in advance, > > > > Asaf > > > > > > > > -- > > View this message in context: > > http://r.789695.n4.nabble.com/Data-type-in-a-data- > frame-tp4647161.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.