Re: [R] Root mean square on binned GAM results

David Winsemius Sat, 19 Jun 2010 07:18:50 -0700

I have replied offlist to Mr. Jarvis with my reasons for notproceeding further with advice about implementing this request. Othersshould feel free to step in if they are so inclined.


--
David.


On Jun 19, 2010, at 12:44 AM, David Jarvis wrote:

Hi, David.
Let me start at the beginning. Between the years (y) 1900 to 2009 Ihave some observed temperature readings (o). For example:
y <- seq(1900, 2009)
o <- runif(110, 9, 15)
So the ordering is fixed: y and o are a time series (shown in thelinked image below). I then calculate a naïve, non-parameterisedmodel (m) of the data using GAM, as follows:
m <- data.frame( x, fitted( gam( y ~ s(x) ) ) )
The values from m are then actually plotted as the trend linedepicted at:
http://i.imgur.com/X0gxV.png
What I am trying to do now is to calculate how accurately GAM fitsthe data. The suggestion I was given was to use RMSE on the observeddata versus the model data. It was also suggested that I use meanbins, with each bin containing 5 values, to reduce the amount oferror in the calculation. Algorithmically, I pictured it as:
        • Let index = 1
        • Let size = 5
        • Let o = vector of observed data
        • Let ob = empty vector
        • Append mean( o[index:index+size-1] ) into ob
        • Let index = index + size
        • Repeat from Step 5 until no more elements in o
At this point, ob would contain the average of: the first fivevalues, the second five values, and so on. Thus length( ob ) =round(length( o ) / 5).
I would then repeat the same calculation on m to get mb, the model'sbins.
With those averages, I could use ob and mb to calculate the normalroot mean square deviation:
nrmse <- sqrt( mean( ob - mb ) ^ 2 ) / (max( ob ) - min( ob ))

Then turn that into a percentage:

100 - nmse
At that point I was hoping I could say that, in general, the resultindicates how closely the model fits the data. The closer to 100%,the more accurate the trend line.
As you can tell, I have very little experience in statistics and Rso any feedback, suggestions, or general guidance would be greatlyappreciated.
Dave

P.S.
The years, the type of weather data, and the locations that themeasurements were taken can all be selected by users when theygenerate the report. So sometimes the data will have 110 years,inclusive, other times it could be 37 years (thus 37 data points).So choosing to average 5 elements per bin is a bit arbitrary... I amlooking to get something working first before tweaking the possibleparameters for the calculation.
Thanks again!


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Root mean square on binned GAM results

Reply via email to