[R] any r package can handle factor levels not in the test set
It looks like gbm, glm all has this issue I wonder if any R package is immune of this? In reality, it is very normal that test data has data unseen in training data. It looks like I have to give up R? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any r package can handle factor levels not in the test set
Thanks for your reply. But I cannot control the data. I am dealing with real world stream data. It is very normal that the test data(when you apply model to do prediction) have new values that are not seen in training data. If I code myself, I would give a random guess or just an intercept for such situation. But it seems most R package returns an error and exit. On Mon, Jan 12, 2015 at 6:08 PM, Richard M. Heiberger wrote: > You need to define the levels of the training set to include all > levels that you might see. > Something like this > > > A <- factor(letters[1:5]) > > B <- factor(letters[c(1,3,5,7,9)]) > > A > [1] a b c d e > Levels: a b c d e > > B > [1] a c e g i > Levels: a c e g i > > training <- factor(A, levels=unique(c(levels(A), levels(B > > training > [1] a b c d e > Levels: a b c d e g i > > > > In the future please "provide commented, minimal, self-contained, > reproducible code." > > On Mon, Jan 12, 2015 at 9:00 PM, HelponR wrote: > > It looks like gbm, glm all has this issue > > > > I wonder if any R package is immune of this? > > > > In reality, it is very normal that test data has data unseen in training > > data. It looks like I have to give up R? > > > > Thanks! > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any r package can handle factor levels not in the training set
sorry I notice the email subject is not accurate. to be specific, when I do predict, there are error messages like factor x has new levels 1, 2 Here x is an attribute(independent var), not outcome. I wonder if the incremental packages (if any) solve this problem? Maybe it is time to write my own package. On Tue, Jan 13, 2015 at 8:59 AM, HelponR wrote: > Thanks for your reply. But I cannot control the data. > I am dealing with real world stream data. It is very normal that the test > data(when you apply model to do prediction) have new values that are not > seen in training data. > If I code myself, I would give a random guess or just an intercept for > such situation. But it seems most R package returns an error and exit. > > > On Mon, Jan 12, 2015 at 6:08 PM, Richard M. Heiberger > wrote: > >> You need to define the levels of the training set to include all >> levels that you might see. >> Something like this >> >> > A <- factor(letters[1:5]) >> > B <- factor(letters[c(1,3,5,7,9)]) >> > A >> [1] a b c d e >> Levels: a b c d e >> > B >> [1] a c e g i >> Levels: a c e g i >> > training <- factor(A, levels=unique(c(levels(A), levels(B >> > training >> [1] a b c d e >> Levels: a b c d e g i >> > >> >> In the future please "provide commented, minimal, self-contained, >> reproducible code." >> >> On Mon, Jan 12, 2015 at 9:00 PM, HelponR wrote: >> > It looks like gbm, glm all has this issue >> > >> > I wonder if any R package is immune of this? >> > >> > In reality, it is very normal that test data has data unseen in training >> > data. It looks like I have to give up R? >> > >> > Thanks! >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to scientific notation numbers
Hi, Sorry I have to bother you a question. I have a file with each line like this: 6.5500e+004 2.82350001e+000 3.2000e+001 1.1580e+003 2.4400e+002 5.9800e+002 2.2700e+002 3.9031e+001 -1.2137e+002 However, I use read.table, it cannot read it correctly. It read in 18 variables instead of 9. I am so frustrated. I tried to search the archive. Seems nobody else has this problem. Thank you! U [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.