Note that changing this does not just mean getting rid of "silly warnings".
Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE.
> d <- data.frame(x=1:10, f=rep(c("A","B","C"), c(4,3,3)), y=c(1:4, 15:17,
28.1,28.8,30.1))
> fit_ab <- lm(y ~ x + f, data = d, subset = f!="B")
Warning message:
In model.matrix.default(mt, mf, contrasts) :
variable 'f' converted to a factor
> predict(fit_ab, newdata=d)
1 2 3 4 5 6 7 8 9 10
1 2 3 4 25 26 27 8 9 10
Warning messages:
1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
variable 'f' converted to a factor
2: In predict.lm(fit_ab, newdata = d) :
prediction from a rank-deficient fit may be misleading
fit_ab is not rank-deficient and the predict should report
1 2 3 4 NA NA NA 28 29 30
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf
> Of Terry Therneau
> Sent: Monday, February 11, 2013 5:50 AM
> To: [email protected]; Duncan Murdoch
> Subject: Re: [Rd] stringsAsFactors
>
> I think your idea to remove the warnings is excellent, and a good compromise.
> Characters
> already work fine in modeling functions except for the silly warning.
>
> It is interesting how often the defaults for a program reflect the data sets
> in use at the
> time the defaults were chosen. There are some such in my own survival
> package whose
> proper value is no longer as "obvious" as it was when I chose them. Factors
> are very
> handy for variables which have only a few levels and will be used in
> modeling. Every
> character variable of every dataset in "Statistical Models in S", which
> introduced
> factors, is of this type so auto-transformation made a lot of sense. The
> "solder" data
> set there is one for which Helmert contrasts are proper so guess what the
> default
> contrast
> option was? (I think there are only a few data sets in the world for which
> Helmert makes
> sense, however, and R eventually changed the default.)
>
> For character variables that should not be factors such as a street adress
> stringsAsFactors can be a real PITA, and I expect that people's preference
> for the option
> depends almost entirely on how often these arise in their own work. As long
> as there is
> an option that can be overridden I'm okay. Yes, I'd prefer FALSE as the
> default, partly
> because the current value is a tripwire in the hallway that eventually
> catches every new
> user.
>
> Terry Therneau
>
> On 02/11/2013 05:00 AM, [email protected] wrote:
> > Both of these were discussed by R Core. I think it's unlikely the
> > default for stringsAsFactors will be changed (some R Core members like
> > the current behaviour), but it's fairly likely the show.signif.stars
> > default will change. (That's if someone gets around to it: I
> > personally don't care about that one. P-values are commonly used
> > statistics, and the stars are just a simple graphical display of them.
> > I find some p-values to be useful, and the display to be harmless.)
> >
> > I think it's really unlikely the more extreme changes (i.e. dropping
> > show.signif.stars completely, or dropping p-values) will happen.
> >
> > Regarding stringsAsFactors: I'm not going to defend keeping it as is,
> > I'll let the people who like it defend it. What I will likely do is
> > make a few changes so that character vectors are automatically changed
> > to factors in modelling functions, so that operating with
> > stringsAsFactors=FALSE doesn't trigger silly warnings.
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel