Hi! I'm using the Naive Bayes classifier provided by the e1071 package ( http://cran.r-project.org/web/packages/e1071) and I've noticed that the predict function has a different behavior when the level set of the columns used for prediction is different from the ones used for fitting. From inspecting the predict.naiveBayes I came to the conclusion that this is due to the conversion of factors to their internal codes using the data.matrix function. For example, consider the following piece of R code:
> library(mlbench) > library(e1071) > data(HouseVotes84) > model <- naiveBayes(Class ~ ., data = HouseVotes84) > head(HouseVotes84) Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 1 republican n y n y y y n n n y <NA> y y y n y 2 republican n y n y y y n n n n n y y y n <NA> 3 democrat <NA> y y <NA> y y n n n n y n y y n n 4 democrat n y y n <NA> y n n n n y n y n n y 5 democrat y y y n y y n n n n y <NA> y y y y 6 democrat n y y n y y n n n n n n y y y y > predict(model, HouseVotes84[1,-1]) [1] republican Levels: democrat republican > new.data <- data.frame(V1="n", V2="y", V3="n", V4="y", V5="y", V6="y", V7="n", V8="n", V9="n", V10="y", V11=NA_character_, V12="y", V13="y", V14="y", V15="n", V16="y", stringsAsFactors=TRUE) > predict(model, new.data) [1] democrat Levels: democrat republican I haven't used other classification methods in R, so I'm unsure if this is what is expected from the application of the predict function. Is this a bug or the expected behavior? Thanks! -- Joao. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.