[R] Weka on command line c.f. using RWeka

Patrick Connolly Sun, 11 Nov 2012 23:56:01 -0800

Running Weka's command line with calls to system(), like this

> system("java weka.classifiers.bayes.NaiveBayes -K -t HWlrTrain.arff -o")


=== Confusion Matrix ===

    a    b   <-- classified as
 3518  597 |    a = NoSpray
  644  926 |    b = Spray

=== Stratified cross-validation ===


=== Confusion Matrix ===

    a    b   <-- classified as
 3512  603 |    a = NoSpray
  653  917 |    b = Spray

So far, no surprises except that maybe I might have expected a few
more misclassifications in the cross-validation.

However,

If I use the same data in R
> train.df <- read.arff("HWlrTrain.arff")
using RWeka, like this:

NB <- make_Weka_classifier("weka/classifiers/bayes/NaiveBayes")
wNB <- NB(decision ~ ., data = train.df,
+           control = Weka_control(K = TRUE))
> summary(wNB)

=== Summary ===

Correctly Classified Instances        4437               78.0475 %
Incorrectly Classified Instances      1248               21.9525 %
Kappa statistic                          0.4446
Mean absolute error                      0.2679
Root mean squared error                  0.3924
Relative absolute error                 67.0055 %
Root relative squared error             87.7545 %
Coverage of cases (0.95 level)          97.9244 %
Mean rel. region size (0.95 level)      83.0519 %
Total Number of Instances             5685     

=== Confusion Matrix ===

    a    b   <-- classified as
 3520  595 |    a = NoSpray
  653  917 |    b = Spray

The resulting confusion matrix is different from both the training and
the cross-validation matrices from Weka's command line.

Somewhat ironically, if I use the model to predict on test data, like
this, predict(wNB, test.df)

I do get exactly the same as I would from the Weka CLI.

Maybe the difference isn't important, but I would have expected the
two approaches would have done exactly the same thing.

Any possible explanations?



-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_                 Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                              ..... Eleanor Roosevelt
          
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weka on command line c.f. using RWeka

Reply via email to