Re: [R] binning results

Noah Silverman Wed, 05 Aug 2009 11:46:54 -0700

Thanks Steve,

I'm halfway there with:


foo <- cbind(foo, range_group=cut(foo$score, breaks=c(.9, .8, .7, .6, 
.5, .4, .3, .2, .1)))
with(foo, tapply(score, list(range_group), mean))


This works, but I only get one of the 3 columns I need, mean(score).

I'm not sure how to get the other two.

It really is the result of a few columns.  For example:
a <- count of entries in this "range"
b <- number of entries in this "range" labeled true
c < b / a   (What percentage of entries in this range are really true


My guess is that there is a way to add more columns to the tapply, but 
I'm not sure where/how.

-N

On 8/5/09 11:22 AM, Steve Lianoglou wrote:
> Hi,
>
> On Aug 5, 2009, at 2:11 PM, Noah Silverman wrote:
>
>> Hello,
>>
>> I asked this as part of a previous message, but never really figured 
>> out a usable solution.  So this is a second attempt.
>>
>> I have an process containing an SVM.  The end result is the 
>> probability that the class is true.  That result is added back to the 
>> original data.
>>
>> So I wind up with a data.frame that looks like this
>>
>> label,v1,v2,v3,prob_true
>>
>> What I want to do is measure how accurate my model is for each range 
>> of probability.  (I've seen this done is a few published papers and 
>> found it a very useful way to visualize things.)
>>
>> My hope/guess is that there is some kind of package for R that does 
>> this since it should be a common need.
>>
>> Here is an example of what I'd like to be able to generate:
>>
>> range        number of items        mean(probability)   true_accuracy
>> 100-90%        20                            .924                    .90
>> 90-80%          50                            .825                    
>> .84
>> 80-70%          214                          .75                      
>> .71
>> etc...
>>
>> range is the range of predicted values by the SVM
>> mean(probability) is the mean of the PREDICTED probability of items 
>> in that range
>> true_accuracy is the mean of the ACTUAL probability of items in that 
>> range.
>>
>> In English I would explain it as, "Of the data where our SVM 
>> predicted a true probability of 70-80%, the data was actually 71% true."
>>
>> It might be really  helpful to be able to graph this somehow.  
>> (Again, There must be some package in R for this??)
>> With mean(predicted_probability) on one axis and 
>> mean(true_probability) on the other axis.
>>
>> Any thoughts, comments, ideas, etc. would be appreciated!
>
> Take a look at the cut function, and the code in the examples of ?cut 
> (eg, take a look at the output when combined w/ table(cut(..)) ).
>
> Sending in your own vector for the ``breaks`` param inorder to bin as 
> you like should get you 90% of the way to building the table you're 
> after.
>
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>   |  Memorial Sloan-Kettering Cancer Center
>   |  Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] binning results

Reply via email to