Re: [R] GLM: What is a good way for dealing with new factor levels in the test set?

2015-05-04 Thread thuksu
For anyone who is looking for an answer to this in the future... I went for "imputation". It's a way of filling in missing variables based off of what you see elsewhere in the data. Myself, I simply took a sample of the categorical from the rest of the test set. Some may argue that this is erro

Re: [R] GLM: What is a good way for dealing with new factor levels in the test set?

2015-04-30 Thread thuksu
Hi, Thanks for the reply! I did try this... # res is a data frame levels(res$mytypeid.f) <- c(levels(res$mytypeid.f),"mynewlevel") logreg <- glm(yesno ~ mytypeid.f + amount, data=res, family="binomial") exp(coef(logreg)) # this result shows that the new level is not included in the regression.

[R] GLM: What is a good way for dealing with new factor levels in the test set?

2015-04-29 Thread thuksu
My training set and my test set have some factor levels that are different It's rare, but it occurs. What is a good way for dealing with this? I don't want to throw away the entire row from the data frame, because there is some valuable information in there. Is there some way to say somethi