Re: [Rd] stringsAsFactors

2013-02-11 Thread Terry Therneau
Peter, I had an earlier response to Duncan that I should have copied to the list. The subset issue can be fixed. When the model changes character to factor, it needs to remember the levels; just like it does with the factors. We are simply seeing a reprise of problems that occured whem mode

Re: [Rd] stringsAsFactors

2013-02-11 Thread peter dalgaard
On Feb 11, 2013, at 18:50 , Duncan Murdoch wrote: > > I do think that it's unfortunate that we don't get the same result in both > cases, and I'd like to have gotten the predictions you suggested, but I don't > think that's going to happen. The reason for the difference is that the > subsett

Re: [Rd] stringsAsFactors

2013-02-11 Thread Duncan Murdoch
On 11/02/2013 2:34 PM, Terry Therneau wrote: The root of this problem is that the .getXlevels function does not return the levels for character variables. Thanks, that looks easy to fix (not by changing .getXlevels, but by having model.frame convert the character variables, instead of waiting

Re: [Rd] stringsAsFactors

2013-02-11 Thread Brian Diggs
On 2/11/2013 5:50 AM, Terry Therneau wrote: I think your idea to remove the warnings is excellent, and a good compromise. Characters already work fine in modeling functions except for the silly warning. It is interesting how often the defaults for a program reflect the data sets in use at the t

Re: [Rd] stringsAsFactors

2013-02-11 Thread Terry Therneau
The root of this problem is that the .getXlevels function does not return the levels for character variables. Future predictions depend on that information. On 02/11/2013 11:50 AM, Duncan Murdoch wrote: On 11/02/2013 12:13 PM, William Dunlap wrote: Note that changing this does not just mean ge

Re: [Rd] stringsAsFactors

2013-02-11 Thread Duncan Murdoch
On 11/02/2013 12:13 PM, William Dunlap wrote: Note that changing this does not just mean getting rid of "silly warnings". Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE. > d <- data.frame(x=1:10, f=rep(c("A","B","C"), c(4,3,3)), y=c(1:4, 15:17, 28.1,28.8,30.1))

Re: [Rd] stringsAsFactors

2013-02-11 Thread William Dunlap
Note that changing this does not just mean getting rid of "silly warnings". Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE. > d <- data.frame(x=1:10, f=rep(c("A","B","C"), c(4,3,3)), y=c(1:4, 15:17, 28.1,28.8,30.1)) > fit_ab <- lm(y ~ x + f, data = d, subset = f

Re: [Rd] stringsAsFactors

2013-02-11 Thread Terry Therneau
I think your idea to remove the warnings is excellent, and a good compromise. Characters already work fine in modeling functions except for the silly warning. It is interesting how often the defaults for a program reflect the data sets in use at the time the defaults were chosen. There are so