On Jun 18, 2010, at 11:08 PM, David Winsemius wrote:
On Jun 18, 2010, at 10:38 PM, David Jarvis wrote:
Hi, David.
accurately reflect how closely the model (GAM) fits the data. I was
told
This was my presumption; I could be mistaken.
that the accuracy of the correlation can be improved using a root
mean
square deviation (RMSD) calculation on binned data.
By whom? ... and with what theoretical basis?
I talked with Christian Schunn. He mentioned that using RMSD would
produce a better result for goodness-of-fit (if that term is not
synonymous with correlation, I apologise -- I'm still rather new to
this level of statistics):
http://www.lrdc.pitt.edu/schunn/gof/index.html
It was regarding a chart similar to:
http://i.imgur.com/X0gxV.png
In the chart, the calculation for Pearson's, Spearman's, and
Kendall's Tau provide, in my opinion, an incorrect indicator as to
the strength of GAM's fit to the data. I could be wrong here, too.
His suggestion was to use bin the means (in groups of 5 or so) to
reduce the noise.
I doubt that your strategy offers any statistical advantage, but if
you want to play around with it then consider:
binned.x <- round( (x + 2.5)/5)
> d <-
c
(1,3,5,4,3,6,3,1,5,7,8,9,4,3,2,7,3,6,8,9,5,3,1,4,5,8,9,3,3,2,5,7,8,8,5,4,3,2,6,4,3,1,4,5,6,8,9,0,7,7,5,4,3,3,2,1,3,4,5,6,7,9,0,2,4,3,3
)
> binned.d <- round( (d + 2.5)/5)
> print(binned.d)
[1] 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 2 1 2 2 2 2 1 1 1 2 2 2 1 1 1 2 2
2 2 2 1 1 1
[39] 2 1 1 1 1 2 2 2 2 0 2 2 2 1 1 1 1 1 1 1 2 2 2 2 0 1 1 1 1
That doesn't make sense to me.
Then I blame your powers of exposition. Without some sort of
explicit example the parsing of English is very prone to error. If
you want to pick out elements of x in some pre-specified order in
groups of five then consider:
> x <- 1:100
>
> rep(1:20, each=5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5
5 5
[24] 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9
9 10
[47] 10 10 10 10 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14
14 14
[70] 14 15 15 15 15 15 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18
19 19
[93] 19 19 19 20 20 20 20 20
> tapply(x, rep(1:20, each=5), mean)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # this
row is just indices
3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98 # this
row is the means
If you wanted them in random groups of roughly 5, then you could use
sample(x, prob=rep(5/n, n/5))
My impression was that I should try to put every 5 values in a bin,
average that bin, then calculate the RMSD between the observed
values and the values from GAM. In other words (o is observed and m
is model):
Do you intend that m[n] would be the predicted value from a model?
How are you forming the groups of 5? Are they ordered? If so ordered
by observed of by predicted? (In R a "model" is a complex list
structure, but may in some cases have a simple predicted value for
each case. Again a specific example might work wonders.
Looking a bit more at that web page and its linked article and Excel
spreadsheet, it appears you are hoping to construct calibration plots.
It appears that the observations are sorted and then binned by
observed predictor (rather than predicted) values. You then compare
the summed GOF statistic on averages of the predicted and observed
means for some "model" within those bins ... the nature of which is
not at all clear to my eyes at this point. Sounds like a calibration
analysis. I think R packages can offer more sophisticated methods. But
you can at any rate use the methods I offered to bin your cases sorted
on the predictor values.
--
David.
bins <- 5
while( length(o) %% bins != 0 ) {
o <- o[-length(o)]
}
omean <- apply( matrix(o, bins), 2, mean )
while( length(m) %% bins!= 0 ) {
m <- m[-length(m)]
}
mmean <- apply( matrix(m, bins), 2, mean )
sqrt( mean( omean - mmean ) ^ 2 )
But that feels sloppy, error prone, and fragile.
Joris mentioned that I could try using tapply with
cut(d,round(length(d)/5)). I couldn't figure out how to get the
means back from the factors.
Dave
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.