Re: [R] statistical significance of accuracy increase in classification

Monica Pisica Tue, 24 Feb 2009 09:49:52 -0800

Hi again,
 
Looking more into test statistics i realized that maybe i can use the 
power.prop.test to see if the difference between the 2 accuracies are zero or 
not. Do you have any comments about that? Also, should i considered kappa 
statistics also a kind of proportion and use the same test? If this does not 
violate any important hypothesis then ....
 
power.prop.test(n = 146, p1 = 0.7877, p2 = 0.8014, strict = TRUE)
 
     Two-sample comparison of proportions power calculation 
              n = 146
             p1 = 0.7877
             p2 = 0.8014
      sig.level = 0.05
          power = 0.0596356
    alternative = two.sided
 NOTE: n is number in *each* group


which just tells that the difference in accuracies are barely different .... 
since the p.value = 0.06> 0.05
 
For Kappa statistics it will be:
 
power.prop.test(n = 146, p1 = 0.3675, p2 = 0.4315, strict = TRUE)
 
     Two-sample comparison of proportions power calculation 
              n = 146
             p1 = 0.3675
             p2 = 0.4315
      sig.level = 0.05
          power = 0.1999816
    alternative = two.sided
 NOTE: n is number in *each* group 

Any comments are really appreciated,
 
Monica

----------------------------------------
> From: pisican...@hotmail.com
> To: r-help@r-project.org
> CC: max.k...@pfizer.com
> Subject: [R] statistical significance of accuracy increase in classification
> Date: Tue, 24 Feb 2009 16:22:41 +0000
>
>
> Hi everyone,
>
> I would like to test for the statistical significance(for what it worth ...) 
> in increasing classification accuracy and kappa statistics from different 
> land classifications. The classifications were done using other software 
> (like eCognition and See5), but the results were "sampled" at locations where 
> i have the "reference" class known. So using package "caret" i did the 
> confusion matrix. For now i am interested in the overall results which give 
> the overall classification accuracy and kappa statistics among others. 
> Depending which classification i test, i have some small increase inaccuracy 
> and a little larger increase in kappa statistics. I wonder if there is a way 
> to do a statistical significance test for the accuracy and kappa increase 
> between the 2 classifications.
>
> Data example and some code:
>
> library(caret)
>
> ref <- c(15, 13, 13, 13, 13, 15, 14, 14, 14, 15, 13, 13, 13, 15, 13, 13, 13, 
> 15, 13, 13, 13, 13, 13, 13, 13,13, 14, 13, 13, 13, 13, 13, 13, 13, 15, 13, 
> 13, 15, 13, 15, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 
> 15, 13, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
> 13,13, 14, 13, 13, 13, 13, 13, 14, 14, 15, 15, 13, 13, 13, 13, 13, 15, 13, 
> 13, 13, 13, 13, 13, 13, 13,13, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
> 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 
> 13, 14, 13, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13)
>
> class1 <- c(14, 14, 13, 13, 13, 15, 13, 14, 15, 14, 14, 13, 14, 13, 13, 13, 
> 13, 13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 
> 13, 14, 13, 13, 14, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 15, 21, 
> 13, 15, 13, 21, 13, 13, 14, 13, 15, 13, 15, 13, 13, 14, 13, 13, 13, 13, 13, 
> 13, 13,13, 14, 14, 13, 13, 13, 13, 15, 15, 15, 15, 13, 13, 13, 13, 13, 5, 13, 
> 15, 13, 13, 13, 13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13, 13, 
> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13, 
> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
>
> class2 <- c(14, 15, 13, 13, 13, 15, 13, 14, 15, 15, 14, 13, 14, 13, 13, 13, 
> 13, 13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15, 
> 13, 14, 13, 13, 15, 13, 13, 15, 14, 13, 13, 13, 13, 13, 13,13, 13, 15, 13, 
> 13, 15, 13, 21, 13, 13, 13, 13, 15, 13, 15, 15, 13, 14, 13, 13, 13, 13, 13, 
> 13, 15,13, 14, 14, 13, 13, 13, 13, 15, 14, 15, 15, 13, 14, 13, 13, 13, 15, 
> 13, 15, 13, 13, 13, 13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13, 
> 13, 13, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 
> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
>
> ref1 <- factor(ref, levels = c(5, 13, 14, 15, 21, 22))
> pred1 <- factor(class1, levels = c(5, 13, 14, 15, 21, 22))
> pred2 <- factor(class2, levels = c(5, 13, 14, 15, 21, 22))
>
> t1 <- table(pred1, ref1)
> t2 <- table(pred2, ref1)
>
> cm1 <- confusionMatrix(t1)
> cm1$overall
>
> cm2 <- confusionMatrix(t2)
> cm2$overall
>
> As you see the increase in accuracy is very small, but the increase in kappa 
> is a little bit more substantial. Is this increase statistical significant?
>
> Thanks for any help,
>
> Monica
> _________________________________________________________________

> http://windowslive.com/howitworks?ocid=TXT_TAGLM_WL_t2_hm_justgotbetter_howitworks_022009
_________________________________________________________________
It’s the same Hotmail®. If by “same” you mean up to 70% faster. 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] statistical significance of accuracy increase in classification

Reply via email to