Hi, David. accurately reflect how closely the model (GAM) fits the data. I was told >> > This was my presumption; I could be mistaken.
> that the accuracy of the correlation can be improved using a root mean >> square deviation (RMSD) calculation on binned data. >> > > By whom? ... and with what theoretical basis? I talked with Christian Schunn. He mentioned that using RMSD would produce a better result for goodness-of-fit (if that term is not synonymous with correlation, I apologise -- I'm still rather new to this level of statistics): http://www.lrdc.pitt.edu/schunn/gof/index.html It was regarding a chart similar to: http://i.imgur.com/X0gxV.png In the chart, the calculation for Pearson's, Spearman's, and Kendall's Tau provide, in my opinion, an incorrect indicator as to the strength of GAM's fit to the data. I could be wrong here, too. His suggestion was to use bin the means (in groups of 5 or so) to reduce the noise. I doubt that your strategy offers any statistical advantage, but if you want > to play around with it then consider: > > binned.x <- round( (x + 2.5)/5) > > d <- c(1,3,5,4,3,6,3,1,5,7,8,9,4,3,2,7,3,6,8,9,5,3,1,4,5,8,9,3,3,2,5,7,8,8,5,4,3,2,6,4,3,1,4,5,6,8,9,0,7,7,5,4,3,3,2,1,3,4,5,6,7,9,0,2,4,3,3) > binned.d <- round( (d + 2.5)/5) > print(binned.d) [1] 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 2 1 2 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 2 2 1 1 1 [39] 2 1 1 1 1 2 2 2 2 0 2 2 2 1 1 1 1 1 1 1 2 2 2 2 0 1 1 1 1 That doesn't make sense to me. My impression was that I should try to put every 5 values in a bin, average that bin, then calculate the RMSD between the observed values and the values from GAM. In other words (o is observed and m is model): bins <- 5 while( length(o) %% bins != 0 ) { o <- o[-length(o)] } omean <- apply( matrix(o, bins), 2, mean ) while( length(m) %% bins!= 0 ) { m <- m[-length(m)] } mmean <- apply( matrix(m, bins), 2, mean ) sqrt( mean( omean - mmean ) ^ 2 ) But that feels sloppy, error prone, and fragile. Joris mentioned that I could try using tapply with cut(d,round(length(d)/5)). I couldn't figure out how to get the means back from the factors. Dave [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.