> PS. Here are two interrelated reasons we don't autoconvert: > > 1. Subject id. Factors give no advantage for a unique id, and some clear > problems. In particular when one creates as subset - everyone over 60 say - > there is no good reason to remember all the ids you didn't select. > 2. Subject id. I work on a lot of studies of fractures and fracture risk. A > time-trend model might be > gam(fracture ~ subject + x1 + x2 + ..., subset=(sex='F')) > > Fracture risk for males and females is so different that separate models are > the sensible thing. If subject is a factor before the call, then my model > has a > zillion unneeded levels. There are other ways out of this issue, but avoiding > factors is the easiest.
3. Factors take up more memory than character vectors. (This is tongue-in-cheek, but in recent versions of R, factor variables take up (very very slightly) more memory than character variables. It's a common myth that the opposite is true) I think R's handling of character vectors has progressed to the point where they should be the norm, not the exception. Maybe others will have different views. Hadley -- http://had.co.nz/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel