Running Weka's command line with calls to system(), like this > system("java weka.classifiers.bayes.NaiveBayes -K -t HWlrTrain.arff -o")
=== Confusion Matrix === a b <-- classified as 3518 597 | a = NoSpray 644 926 | b = Spray === Stratified cross-validation === === Confusion Matrix === a b <-- classified as 3512 603 | a = NoSpray 653 917 | b = Spray So far, no surprises except that maybe I might have expected a few more misclassifications in the cross-validation. However, If I use the same data in R > train.df <- read.arff("HWlrTrain.arff") using RWeka, like this: NB <- make_Weka_classifier("weka/classifiers/bayes/NaiveBayes") wNB <- NB(decision ~ ., data = train.df, + control = Weka_control(K = TRUE)) > summary(wNB) === Summary === Correctly Classified Instances 4437 78.0475 % Incorrectly Classified Instances 1248 21.9525 % Kappa statistic 0.4446 Mean absolute error 0.2679 Root mean squared error 0.3924 Relative absolute error 67.0055 % Root relative squared error 87.7545 % Coverage of cases (0.95 level) 97.9244 % Mean rel. region size (0.95 level) 83.0519 % Total Number of Instances 5685 === Confusion Matrix === a b <-- classified as 3520 595 | a = NoSpray 653 917 | b = Spray The resulting confusion matrix is different from both the training and the cross-validation matrices from Weka's command line. Somewhat ironically, if I use the model to predict on test data, like this, predict(wNB, test.df) I do get exactly the same as I would from the Weka CLI. Maybe the difference isn't important, but I would have expected the two approaches would have done exactly the same thing. Any possible explanations? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.