[R] statistical significance of accuracy increase in classification

Monica Pisica Tue, 24 Feb 2009 08:23:43 -0800

Hi everyone,

I would like to test for the statistical significance(for what it worth ...) in 
increasing classification accuracy and kappa statistics from different land 
classifications. The classifications were done using other software (like 
eCognition and See5), but the results were "sampled" at locations where i have 
the "reference" class known. So using package "caret" i did the confusion 
matrix. For now i am interested in the overall results which give the overall 
classification accuracy and kappa statistics among others. Depending which 
classification i test, i have some small increase inaccuracy and a little 
larger increase in kappa statistics. I wonder if there is a way to do a 
statistical significance test for the accuracy and kappa increase between the 2 
classifications.


Data example and some code:

library(caret)
 
ref <- c(15, 13, 13, 13, 13, 15, 14, 14, 14, 15, 13, 13, 13, 15, 13, 13, 13, 
15, 13, 13, 13, 13, 13, 13, 13,13, 14, 13, 13, 13, 13, 13, 13, 13, 15, 13, 13, 
15, 13, 15, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 15, 13, 
13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,13, 14, 
13, 13, 13, 13, 13, 14, 14, 15, 15, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 
13, 13, 13,13, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 13, 14, 13, 13, 13, 13, 
13, 13, 15, 13, 13, 13, 13, 13, 13)

class1 <- c(14, 14, 13, 13, 13, 15, 13, 14, 15, 14, 14, 13, 14, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 13, 14, 
13, 13, 14, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 15, 21, 13, 15, 13, 
21, 13, 13, 14, 13, 15, 13, 15, 13, 13, 14, 13, 13, 13, 13, 13, 13, 13,13, 14, 
14, 13, 13, 13, 13, 15, 15, 15, 15, 13, 13, 13, 13, 13, 5, 13, 15, 13, 13, 13, 
13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13)

class2 <- c(14, 15, 13, 13, 13, 15, 13, 14, 15, 15, 14, 13, 14, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 13, 14, 
13, 13, 15, 13, 13, 15, 14, 13, 13, 13, 13, 13, 13,13, 13, 15, 13, 13, 15, 13, 
21, 13, 13, 13, 13, 15, 13, 15, 15, 13, 14, 13, 13, 13, 13, 13, 13, 15,13, 14, 
14, 13, 13, 13, 13, 15, 14, 15, 15, 13, 14, 13, 13, 13, 15, 13, 15, 13, 13, 13, 
13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
22, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13)

ref1 <- factor(ref, levels = c(5, 13, 14, 15, 21, 22))
pred1 <- factor(class1, levels = c(5, 13, 14, 15, 21, 22))
pred2 <- factor(class2, levels = c(5, 13, 14, 15, 21, 22))

t1 <- table(pred1, ref1)
t2 <- table(pred2, ref1)

cm1 <- confusionMatrix(t1)
cm1$overall

cm2 <- confusionMatrix(t2)
cm2$overall

As you see the increase in accuracy is very small, but the increase in kappa is 
a little bit more substantial. Is this increase statistical significant?

Thanks for any help,
 
Monica
_________________________________________________________________


owitworks_022009
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] statistical significance of accuracy increase in classification

Reply via email to