Hi again,
Looking more into test statistics i realized that maybe i can use the
power.prop.test to see if the difference between the 2 accuracies are zero or
not. Do you have any comments about that? Also, should i considered kappa
statistics also a kind of proportion and use the same test? If this does not
violate any important hypothesis then ....
power.prop.test(n = 146, p1 = 0.7877, p2 = 0.8014, strict = TRUE)
Two-sample comparison of proportions power calculation
n = 146
p1 = 0.7877
p2 = 0.8014
sig.level = 0.05
power = 0.0596356
alternative = two.sided
NOTE: n is number in *each* group
which just tells that the difference in accuracies are barely different ....
since the p.value = 0.06> 0.05
For Kappa statistics it will be:
power.prop.test(n = 146, p1 = 0.3675, p2 = 0.4315, strict = TRUE)
Two-sample comparison of proportions power calculation
n = 146
p1 = 0.3675
p2 = 0.4315
sig.level = 0.05
power = 0.1999816
alternative = two.sided
NOTE: n is number in *each* group
Any comments are really appreciated,
Monica
----------------------------------------
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> Subject: [R] statistical significance of accuracy increase in classification
> Date: Tue, 24 Feb 2009 16:22:41 +0000
>
>
> Hi everyone,
>
> I would like to test for the statistical significance(for what it worth ...)
> in increasing classification accuracy and kappa statistics from different
> land classifications. The classifications were done using other software
> (like eCognition and See5), but the results were "sampled" at locations where
> i have the "reference" class known. So using package "caret" i did the
> confusion matrix. For now i am interested in the overall results which give
> the overall classification accuracy and kappa statistics among others.
> Depending which classification i test, i have some small increase inaccuracy
> and a little larger increase in kappa statistics. I wonder if there is a way
> to do a statistical significance test for the accuracy and kappa increase
> between the 2 classifications.
>
> Data example and some code:
>
> library(caret)
>
> ref <- c(15, 13, 13, 13, 13, 15, 14, 14, 14, 15, 13, 13, 13, 15, 13, 13, 13,
> 15, 13, 13, 13, 13, 13, 13, 13,13, 14, 13, 13, 13, 13, 13, 13, 13, 15, 13,
> 13, 15, 13, 15, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13,
> 15, 13, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
> 13,13, 14, 13, 13, 13, 13, 13, 14, 14, 15, 15, 13, 13, 13, 13, 13, 15, 13,
> 13, 13, 13, 13, 13, 13, 13,13, 13, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13,
> 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13,
> 13, 14, 13, 13, 13, 13, 13, 13, 15, 13, 13, 13, 13, 13, 13)
>
> class1 <- c(14, 14, 13, 13, 13, 15, 13, 14, 15, 14, 14, 13, 14, 13, 13, 13,
> 13, 13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15,
> 13, 14, 13, 13, 14, 13, 13, 15, 13, 13, 13, 13, 13, 13, 13,13, 13, 15, 21,
> 13, 15, 13, 21, 13, 13, 14, 13, 15, 13, 15, 13, 13, 14, 13, 13, 13, 13, 13,
> 13, 13,13, 14, 14, 13, 13, 13, 13, 15, 15, 15, 15, 13, 13, 13, 13, 13, 5, 13,
> 15, 13, 13, 13, 13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13, 13,
> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13, 13,
> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
>
> class2 <- c(14, 15, 13, 13, 13, 15, 13, 14, 15, 15, 14, 13, 14, 13, 13, 13,
> 13, 13, 13, 13, 13, 13, 13, 14, 13,13, 13, 13, 13, 13, 13, 13, 13, 13, 15,
> 13, 14, 13, 13, 15, 13, 13, 15, 14, 13, 13, 13, 13, 13, 13,13, 13, 15, 13,
> 13, 15, 13, 21, 13, 13, 13, 13, 15, 13, 15, 15, 13, 14, 13, 13, 13, 13, 13,
> 13, 15,13, 14, 14, 13, 13, 13, 13, 15, 14, 15, 15, 13, 14, 13, 13, 13, 15,
> 13, 15, 13, 13, 13, 13, 13, 13,15, 13, 15, 14, 13, 13, 13, 13, 13, 13, 13,
> 13, 13, 13, 13, 13, 13, 22, 13, 13, 13, 13, 13, 13, 13,13, 13, 13, 13, 13,
> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
>
> ref1 <- factor(ref, levels = c(5, 13, 14, 15, 21, 22))
> pred1 <- factor(class1, levels = c(5, 13, 14, 15, 21, 22))
> pred2 <- factor(class2, levels = c(5, 13, 14, 15, 21, 22))
>
> t1 <- table(pred1, ref1)
> t2 <- table(pred2, ref1)
>
> cm1 <- confusionMatrix(t1)
> cm1$overall
>
> cm2 <- confusionMatrix(t2)
> cm2$overall
>
> As you see the increase in accuracy is very small, but the increase in kappa
> is a little bit more substantial. Is this increase statistical significant?
>
> Thanks for any help,
>
> Monica
> _________________________________________________________________
> http://windowslive.com/howitworks?ocid=TXT_TAGLM_WL_t2_hm_justgotbetter_howitworks_022009
_________________________________________________________________
It’s the same Hotmail®. If by “same” you mean up to 70% faster.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.