Re: [R] Root mean square on binned GAM results

David Winsemius Fri, 18 Jun 2010 20:39:23 -0700


On Jun 18, 2010, at 11:08 PM, David Winsemius wrote:

On Jun 18, 2010, at 10:38 PM, David Jarvis wrote:
Hi, David.
accurately reflect how closely the model (GAM) fits the data. I wastold
This was my presumption; I could be mistaken.
that the accuracy of the correlation can be improved using a rootmean
square deviation (RMSD) calculation on binned data.

By whom? ...  and with what theoretical basis?
I talked with Christian Schunn. He mentioned that using RMSD wouldproduce a better result for goodness-of-fit (if that term is notsynonymous with correlation, I apologise -- I'm still rather new tothis level of statistics):
http://www.lrdc.pitt.edu/schunn/gof/index.html

It was regarding a chart similar to:

http://i.imgur.com/X0gxV.png
In the chart, the calculation for Pearson's, Spearman's, andKendall's Tau provide, in my opinion, an incorrect indicator as tothe strength of GAM's fit to the data. I could be wrong here, too.
His suggestion was to use bin the means (in groups of 5 or so) toreduce the noise.
I doubt that your strategy offers any statistical advantage, but ifyou want to play around with it then consider:
binned.x <- round( (x + 2.5)/5)
> d <-c(1,3,5,4,3,6,3,1,5,7,8,9,4,3,2,7,3,6,8,9,5,3,1,4,5,8,9,3,3,2,5,7,8,8,5,4,3,2,6,4,3,1,4,5,6,8,9,0,7,7,5,4,3,3,2,1,3,4,5,6,7,9,0,2,4,3,3)
> binned.d <- round( (d + 2.5)/5)
> print(binned.d)
[1] 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 2 1 2 2 2 2 1 1 1 2 2 2 1 1 1 2 22 2 2 1 1 1
[39] 2 1 1 1 1 2 2 2 2 0 2 2 2 1 1 1 1 1 1 1 2 2 2 2 0 1 1 1 1

That doesn't make sense to me.
Then I blame your powers of exposition. Without some sort ofexplicit example the parsing of English is very prone to error. Ifyou want to pick out elements of x in some pre-specified order ingroups of five then consider:
> x <- 1:100
>
> rep(1:20, each=5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 55 5[24] 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 99 10[47] 10 10 10 10 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 1414 14[70] 14 15 15 15 15 15 16 16 16 16 16 17 17 17 17 17 18 18 18 18 1819 19
[93] 19 19 19 20 20 20 20 20
> tapply(x, rep(1:20, each=5), mean)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # thisrow is just indices3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98 # thisrow is the means
If you wanted them in random groups of roughly 5, then you could usesample(x, prob=rep(5/n, n/5))
My impression was that I should try to put every 5 values in a bin,average that bin, then calculate the RMSD between the observedvalues and the values from GAM. In other words (o is observed and mis model):
Do you intend that m[n] would be the predicted value from a model?How are you forming the groups of 5? Are they ordered? If so orderedby observed of by predicted? (In R a "model" is a complex liststructure, but may in some cases have a simple predicted value foreach case. Again a specific example might work wonders.

Looking a bit more at that web page and its linked article and Excelspreadsheet, it appears you are hoping to construct calibration plots.It appears that the observations are sorted and then binned byobserved predictor (rather than predicted) values. You then comparethe summed GOF statistic on averages of the predicted and observedmeans for some "model" within those bins ... the nature of which isnot at all clear to my eyes at this point. Sounds like a calibrationanalysis. I think R packages can offer more sophisticated methods. Butyou can at any rate use the methods I offered to bin your cases sortedon the predictor values.


--
David.


 bins <- 5

 while( length(o) %% bins != 0 ) {
   o <- o[-length(o)]
 }
 omean <- apply( matrix(o, bins), 2, mean )

 while( length(m) %% bins!= 0 ) {
   m <- m[-length(m)]
 }
 mmean <- apply( matrix(m, bins), 2, mean )

 sqrt( mean( omean - mmean ) ^ 2 )

But that feels sloppy, error prone, and fragile.

Joris mentioned that I could try using tapply withcut(d,round(length(d)/5)). I couldn't figure out how to get themeans back from the factors.


Dave


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Root mean square on binned GAM results

Reply via email to