Hi,

On Aug 5, 2009, at 2:11 PM, Noah Silverman wrote:

Hello,

I asked this as part of a previous message, but never really figured out a usable solution. So this is a second attempt.

I have an process containing an SVM. The end result is the probability that the class is true. That result is added back to the original data.

So I wind up with a data.frame that looks like this

label,v1,v2,v3,prob_true

What I want to do is measure how accurate my model is for each range of probability. (I've seen this done is a few published papers and found it a very useful way to visualize things.)

My hope/guess is that there is some kind of package for R that does this since it should be a common need.

Here is an example of what I'd like to be able to generate:

range        number of items        mean(probability)   true_accuracy
100-90% 20 . 924 .90 90-80% 50 . 825 .84 80-70% 214 . 75 .71
etc...

range is the range of predicted values by the SVM
mean(probability) is the mean of the PREDICTED probability of items in that range true_accuracy is the mean of the ACTUAL probability of items in that range.

In English I would explain it as, "Of the data where our SVM predicted a true probability of 70-80%, the data was actually 71% true."

It might be really helpful to be able to graph this somehow. (Again, There must be some package in R for this??) With mean(predicted_probability) on one axis and mean(true_probability) on the other axis.

Any thoughts, comments, ideas, etc. would be appreciated!

Take a look at the cut function, and the code in the examples of ?cut (eg, take a look at the output when combined w/ table(cut(..)) ).

Sending in your own vector for the ``breaks`` param inorder to bin as you like should get you 90% of the way to building the table you're after.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to