Hi! I think I had some issues with the charset on the previous message, so I'm sending this again. Sorry for the double post.
I'm using the Naive Bayes classifier provided by the e1071 package ( http://cran.r-project.org/web/packages/e1071) and I've noticed that the predict function has a different behavior when the level set of the columns used for prediction is different from the ones used for fitting. From inspecting the predict.naiveBayes I came to the conclusion that this is due to the conversion of factors to their internal codes using the data.matrix function. For example, consider the following piece of R code: > library(mlbench) > library(e1071) > data(HouseVotes84) > model <- naiveBayes(Class ~ ., data = HouseVotes84) > head(HouseVotes84) Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 1 republican n y n y y y n n n y <NA> y y y n y 2 republican n y n y y y n n n n n y y y n <NA> 3 democrat <NA> y y <NA> y y n n n n y n y y n n 4 democrat n y y n <NA> y n n n n y n y n n y 5 democrat y y y n y y n n n n y <NA> y y y y 6 democrat n y y n y y n n n n n n y y y y > predict(model, HouseVotes84[1,-1]) [1] republican Levels: democrat republican > new.data <- data.frame(V1="n", V2="y", V3="n", V4="y", V5="y", V6="y", V7="n", V8="n", V9="n", V10="y", V11=NA_character_, V12="y", V13="y", V14="y", V15="n", V16="y", stringsAsFactors=TRUE) > predict(model, new.data) [1] democrat Levels: democrat republican I haven't used other classification methods in R, so I'm unsure if this is what is expected from the application of the predict function. Is this a bug or the expected behavior? Thanks! -- Joao. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.