On Dec 11, 2007 3:35 PM, Giovane <[EMAIL PROTECTED]> wrote: > > So here comes my doubt: why do I have an value of 67.52% for R² when > creating the model(that is , the model explains 67.52% of the data) and > when I use this same model on the same input data, why does postResample > return a very different value associated to R²? >
Let's get in the WayBack machine and return to 4 days ago when I said: > As has been previously noted on this list, there are a number of > formulas for R-squared. This function uses the square of the > correlation between the observed and predicted. The next version of > caret will offer a choice of formulas. For your data: > cor(prediction, input$TOTAL)^2 [1] 0.3300378 For R-squared, summary.lm uses ans$r.squared <- mss/(mss + rss) ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf) and for your data rdf = 31, df.int = 0 and n = 35. In other words, the Rsquared estimate form summary.lm adjusts for the degrees of freedom and postResample does not. Why doesn't it use the df? In ?postResample you would see "Note that many models have more predictors (or parameters) than data points, so the typical mean squared error denominator (n - p) does not apply. Root mean squared error is calculated using sqrt(mean((pred - obs)^2)). Also, R-squared is calculated as the square of the correlation between the observed and predicted outcomes." Since caret is useful for comparing different types of models, we use biased estimate of the root MSE since we would like to directly compare the RMSE from different models (say a linear regression and a support vector machine). Many of these models do not have an explicit number of parameters, so we use mse <- mean((pred - obs)^2) Max ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.