Yes, that's true. On a test set, the highest probability of being in the smaller class is about 40%. (Incidentally, accuracy on the test set is much higher when I use the best-according-to-Kappa model instead of the best-according-to-Accuracy model.)
It looks like the ctree() method supports weights, but all it does is multiply the class likelihoods, which isn't what I want. (That is, if I assign a weight of 2 to all of the small-class instances, it generates the same model, but says that the likelihood for the most-confident instances is about 80% instead of 40%!) I'm still not really understanding why Kappa is not acting like a positive monotonic function of Accuracy, though. Thanks! On Wed, Jun 22, 2011 at 8:12 PM, kuhnA03 <max.k...@pfizer.com> wrote: > Harlan, > > It looks like your model is predicting (almost) everything to be the > majority class (accuracy is almost the same as the largest class > percentage). Try setting a test set aside and use confusionMatrix to look at > how the model is predicting in more detail. You can try other models that > will let you weight the minority class higher to get a more balanced > prediction. > > Max > > > > On 6/22/11 3:37 PM, "Harlan Harris" <har...@harris.name> wrote: > > Hello, > > When evaluating different learning methods for a categorization problem > with the (really useful!) caret package, I'm getting confusing results from > the Kappa computation. The data is about 20,000 rows and a few dozen > columns, and the categories are quite asymmetrical, 4.1% in one category and > 95.9% in the other. When I train a ctree model as: > > model <- train(dat.dts, > dat.dts.class, > method='ctree', > tuneLength=8, > trControl=trainControl(number = 5, workers=1), > metric='Kappa') > > I get the following puzzling numbers: > > > > mincriterion Accuracy Kappa Accuracy SD Kappa SD > 0.01 0.961 0.0609 0.00151 0.0264 > 0.15 0.962 0.049 0.00116 0.0248 > 0.29 0.963 0.0405 0.00227 0.035 > 0.43 0.964 0.0349 0.00257 0.0247 > 0.57 0.964 0.0382 0.0022 0.0199 > 0.71 0.964 0.0354 0.00255 0.0257 > 0.85 0.964 0.036 0.00224 0.024 > 0.99 0.965 0.0091 0.00173 0.0203 > > (mincriterion determines the likelihood of accepting a split into the > tree.) The Accuracy numbers look sorta reasonable, if not great; the model > overfits and barely beats the base rate if it builds a complicated tree. But > the Kappa numbers go the opposite direction, and here's where I'm not sure > what's going on. The examples in the vingette show Accuracy and Kappa being > positively correlated. I thought Kappa was just (Accuracy - baserate)/(1 - > baserate), but the reported Kappa is definitely not that. > > Suggestions? Aside from looking for a better model, which would be good > advice here, what metric would you recommend? Thank you! > > -Harlan > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.