Hi!

I think I had some issues with the charset on the previous message, so I'm
sending this again. Sorry for the double post.

I'm using the Naive Bayes classifier provided by the e1071 package (
http://cran.r-project.org/web/packages/e1071) and I've noticed that the
predict function has a different behavior when the level set of the columns
used for prediction is different from the ones used for fitting. From
inspecting the predict.naiveBayes I came to the conclusion that this is due
to the conversion of factors to their internal codes using the data.matrix
function. For example, consider the following piece of R code:

> library(mlbench)
> library(e1071)
> data(HouseVotes84)
> model <- naiveBayes(Class ~ ., data = HouseVotes84)
> head(HouseVotes84)
       Class   V1 V2 V3   V4   V5 V6 V7 V8 V9 V10  V11  V12 V13 V14 V15  V16
1 republican    n  y  n    y    y  y  n  n  n   y <NA>    y   y   y   n    y
2 republican    n  y  n    y    y  y  n  n  n   n    n    y   y   y   n <NA>
3   democrat <NA>  y  y <NA>    y  y  n  n  n   n    y    n   y   y   n    n
4   democrat    n  y  y    n <NA>  y  n  n  n   n    y    n   y   n   n    y
5   democrat    y  y  y    n    y  y  n  n  n   n    y <NA>   y   y   y    y
6   democrat    n  y  y    n    y  y  n  n  n   n    n    n   y   y   y    y
> predict(model, HouseVotes84[1,-1])
[1] republican
Levels: democrat republican
> new.data <- data.frame(V1="n", V2="y", V3="n", V4="y", V5="y", V6="y",
V7="n", V8="n", V9="n", V10="y", V11=NA_character_, V12="y", V13="y",
V14="y", V15="n", V16="y", stringsAsFactors=TRUE)
> predict(model, new.data)
[1] democrat
Levels: democrat republican

I haven't used other classification methods in R, so I'm unsure if this is
what is expected from the application of the predict function. Is this a
bug or the expected behavior?

Thanks!

--
Joao.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to