On Jun 18, 2010, at 11:08 PM, David Winsemius wrote:


On Jun 18, 2010, at 10:38 PM, David Jarvis wrote:

Hi, David.

accurately reflect how closely the model (GAM) fits the data. I was told

This was my presumption; I could be mistaken.

that the accuracy of the correlation can be improved using a root mean
square deviation (RMSD) calculation on binned data.

By whom? ...  and with what theoretical basis?

I talked with Christian Schunn. He mentioned that using RMSD would produce a better result for goodness-of-fit (if that term is not synonymous with correlation, I apologise -- I'm still rather new to this level of statistics):

http://www.lrdc.pitt.edu/schunn/gof/index.html

It was regarding a chart similar to:

http://i.imgur.com/X0gxV.png

In the chart, the calculation for Pearson's, Spearman's, and Kendall's Tau provide, in my opinion, an incorrect indicator as to the strength of GAM's fit to the data. I could be wrong here, too.

His suggestion was to use bin the means (in groups of 5 or so) to reduce the noise.

I doubt that your strategy offers any statistical advantage, but if you want to play around with it then consider:

binned.x <- round( (x + 2.5)/5)

> d <- c (1,3,5,4,3,6,3,1,5,7,8,9,4,3,2,7,3,6,8,9,5,3,1,4,5,8,9,3,3,2,5,7,8,8,5,4,3,2,6,4,3,1,4,5,6,8,9,0,7,7,5,4,3,3,2,1,3,4,5,6,7,9,0,2,4,3,3 )
> binned.d <- round( (d + 2.5)/5)
> print(binned.d)
[1] 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 2 1 2 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 2 2 1 1 1
[39] 2 1 1 1 1 2 2 2 2 0 2 2 2 1 1 1 1 1 1 1 2 2 2 2 0 1 1 1 1

That doesn't make sense to me.

Then I blame your powers of exposition. Without some sort of explicit example the parsing of English is very prone to error. If you want to pick out elements of x in some pre-specified order in groups of five then consider:

> x <- 1:100
>
> rep(1:20, each=5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 [24] 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 [47] 10 10 10 10 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 14 [70] 14 15 15 15 15 15 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18 19 19
[93] 19 19 19 20 20 20 20 20
> tapply(x, rep(1:20, each=5), mean)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # this row is just indices 3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98 # this row is the means

If you wanted them in random groups of roughly 5, then you could use sample(x, prob=rep(5/n, n/5))


My impression was that I should try to put every 5 values in a bin, average that bin, then calculate the RMSD between the observed values and the values from GAM. In other words (o is observed and m is model):

Do you intend that m[n] would be the predicted value from a model? How are you forming the groups of 5? Are they ordered? If so ordered by observed of by predicted? (In R a "model" is a complex list structure, but may in some cases have a simple predicted value for each case. Again a specific example might work wonders.

Looking a bit more at that web page and its linked article and Excel spreadsheet, it appears you are hoping to construct calibration plots. It appears that the observations are sorted and then binned by observed predictor (rather than predicted) values. You then compare the summed GOF statistic on averages of the predicted and observed means for some "model" within those bins ... the nature of which is not at all clear to my eyes at this point. Sounds like a calibration analysis. I think R packages can offer more sophisticated methods. But you can at any rate use the methods I offered to bin your cases sorted on the predictor values.



--
David.


 bins <- 5

 while( length(o) %% bins != 0 ) {
   o <- o[-length(o)]
 }
 omean <- apply( matrix(o, bins), 2, mean )

 while( length(m) %% bins!= 0 ) {
   m <- m[-length(m)]
 }
 mmean <- apply( matrix(m, bins), 2, mean )

 sqrt( mean( omean - mmean ) ^ 2 )

But that feels sloppy, error prone, and fragile.

Joris mentioned that I could try using tapply with cut(d,round(length(d)/5)). I couldn't figure out how to get the means back from the factors.

Dave


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to