Thanks Steve, I'm halfway there with:
foo <- cbind(foo, range_group=cut(foo$score, breaks=c(.9, .8, .7, .6, .5, .4, .3, .2, .1))) with(foo, tapply(score, list(range_group), mean)) This works, but I only get one of the 3 columns I need, mean(score). I'm not sure how to get the other two. It really is the result of a few columns. For example: a <- count of entries in this "range" b <- number of entries in this "range" labeled true c < b / a (What percentage of entries in this range are really true My guess is that there is a way to add more columns to the tapply, but I'm not sure where/how. -N On 8/5/09 11:22 AM, Steve Lianoglou wrote: > Hi, > > On Aug 5, 2009, at 2:11 PM, Noah Silverman wrote: > >> Hello, >> >> I asked this as part of a previous message, but never really figured >> out a usable solution. So this is a second attempt. >> >> I have an process containing an SVM. The end result is the >> probability that the class is true. That result is added back to the >> original data. >> >> So I wind up with a data.frame that looks like this >> >> label,v1,v2,v3,prob_true >> >> What I want to do is measure how accurate my model is for each range >> of probability. (I've seen this done is a few published papers and >> found it a very useful way to visualize things.) >> >> My hope/guess is that there is some kind of package for R that does >> this since it should be a common need. >> >> Here is an example of what I'd like to be able to generate: >> >> range number of items mean(probability) true_accuracy >> 100-90% 20 .924 .90 >> 90-80% 50 .825 >> .84 >> 80-70% 214 .75 >> .71 >> etc... >> >> range is the range of predicted values by the SVM >> mean(probability) is the mean of the PREDICTED probability of items >> in that range >> true_accuracy is the mean of the ACTUAL probability of items in that >> range. >> >> In English I would explain it as, "Of the data where our SVM >> predicted a true probability of 70-80%, the data was actually 71% true." >> >> It might be really helpful to be able to graph this somehow. >> (Again, There must be some package in R for this??) >> With mean(predicted_probability) on one axis and >> mean(true_probability) on the other axis. >> >> Any thoughts, comments, ideas, etc. would be appreciated! > > Take a look at the cut function, and the code in the examples of ?cut > (eg, take a look at the output when combined w/ table(cut(..)) ). > > Sending in your own vector for the ``breaks`` param inorder to bin as > you like should get you 90% of the way to building the table you're > after. > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.