Yes, that's true. On a test set, the highest probability of being in the
smaller class is about 40%. (Incidentally, accuracy on the test set is much
higher when I use the best-according-to-Kappa model instead of the
best-according-to-Accuracy model.)

It looks like the ctree() method supports weights, but all it does is
multiply the class likelihoods, which isn't what I want. (That is, if I
assign a weight of 2 to all of the small-class instances, it generates the
same model, but says that the likelihood for the most-confident instances is
about 80% instead of 40%!)

I'm still not really understanding why Kappa is not acting like a positive
monotonic function of Accuracy, though.

Thanks!


On Wed, Jun 22, 2011 at 8:12 PM, kuhnA03 <max.k...@pfizer.com> wrote:

>  Harlan,
>
> It looks like your model is predicting (almost) everything to be the
> majority class (accuracy is almost the same as the largest class
> percentage). Try setting a test set aside and use confusionMatrix to look at
> how the model is predicting in more detail. You can try other models that
> will let you weight the minority class higher to get a more balanced
> prediction.
>
> Max
>
>
>
> On 6/22/11 3:37 PM, "Harlan Harris" <har...@harris.name> wrote:
>
> Hello,
>
> When evaluating different learning methods for a categorization problem
> with the (really useful!) caret package, I'm getting confusing results from
> the Kappa computation. The data is about 20,000 rows and a few dozen
> columns, and the categories are quite asymmetrical, 4.1% in one category and
> 95.9% in the other. When I train a ctree model as:
>
> model <- train(dat.dts,
>                  dat.dts.class,
>                  method='ctree',
>                  tuneLength=8,
>                  trControl=trainControl(number = 5, workers=1),
>                  metric='Kappa')
>
> I get the following puzzling numbers:
>
>
>
>   mincriterion  Accuracy  Kappa   Accuracy SD  Kappa SD
>   0.01          0.961     0.0609  0.00151      0.0264
>   0.15          0.962     0.049   0.00116      0.0248
>   0.29          0.963     0.0405  0.00227      0.035
>   0.43          0.964     0.0349  0.00257      0.0247
>   0.57          0.964     0.0382  0.0022       0.0199
>   0.71          0.964     0.0354  0.00255      0.0257
>   0.85          0.964     0.036   0.00224      0.024
>   0.99          0.965     0.0091  0.00173      0.0203
>
> (mincriterion determines the likelihood of accepting a split into the
> tree.) The Accuracy numbers look sorta reasonable, if not great; the model
> overfits and barely beats the base rate if it builds a complicated tree. But
> the Kappa numbers go the opposite direction, and here's where I'm not sure
> what's going on. The examples in the vingette show Accuracy and Kappa being
> positively correlated. I thought Kappa was just (Accuracy - baserate)/(1 -
> baserate), but the reported Kappa is definitely not that.
>
> Suggestions? Aside from looking for a better model, which would be good
> advice here, what metric would you recommend? Thank you!
>
>  -Harlan
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to