Hello, I'm with a conceptual doubt regarding Rsquared of both lm() and postResample(library caret).
I've got a multiple regression linear model (lets say mlr) with anR² value of 67.52%. Then I use this model pro make predictions with predict() function using the same data as input , that is, use the generated model to predict the value associated with data that I used as input. Next, if I apply postResample() to the observed and predicted data, why do I have have an R² value of 33%? I mean, wasn't it supposed to be, at least, 67%, as in the original model, since they're using the same data as input? Here is the code (the data goes on the end of the email) #read input data input<-read.table("input.csv", header=T) # multiple linear regression mlr<-lm(input$TOTAL~-1 + input$A + input$B + input$C + input$D) #observe the model summary(mlr) Call: lm(formula = input$TOTAL ~ -1 + input$A + input$B + input$C + input$D) Residuals: Min 1Q Median 3Q Max -25.753 -7.455 2.396 12.615 55.316 Coefficients: Estimate Std. Error t value Pr(>|t|) input$A 10.5985 3.9782 2.664 0.0121 * input$B 0.3471 17.7731 0.020 0.9845 input$C 0.9468 1.9442 0.487 0.6297 input$D 12.1056 4.7262 2.561 0.0155 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 17.08 on 31 degrees of freedom Multiple R-Squared: 0.6752, Adjusted R-squared: 0.6333 F-statistic: 16.11 on 4 and 31 DF, p-value: 3.090e-07 #as we noticed, an Rsquared value of 67.52% #next, lets predict the results with the same input data prediction<-predict(mlr,input) #now let's evaluate the predictions, observing the R² and RMSE values that postResample returns postResample(input$TOTAL, prediction) RMSE Rsquared 16.0718506 0.3300378 So here comes my doubt: why do I have an value of 67.52% for R² when creating the model(that is , the model explains 67.52% of the data) and when I use this same model on the same input data, why does postResample return a very different value associated to R²? Best regards, Giovane #input.csv file used as input "A" "B" "C" "D" "TOTAL" 1 0 1 0 3.8 1 0 1 0 21.67 1 0 0 0 2.92 2 0 6 0 42.84 0 0 0 0 5.28 2 0 0 3 44.86 1 0 0 0 8.22 1 0 0 0 28.24 1 0 3 0 29.69 1 0 0 1 78.02 3 0 7 0 51.29 2 0 0 0 37.55 2 0 2 0 10.82 1 0 3 0 17.67 0 0 0 0 6.62 2 1 3 1 36.49 0 0 0 0 37.52 1 0 2 0 5.26 1 0 2 0 7.32 1 0 0 0 2.2 2 0 6 0 39.24 0 0 0 0 2.83 2 0 0 3 50.93 1 0 0 0 4.15 1 0 0 0 29.72 1 0 3 0 4.26 1 0 0 1 25.1 3 0 7 0 12.67 2 0 0 0 7.99 2 0 2 0 17.55 1 0 3 0 3.66 0 0 0 0 7.22 0 0 0 0 3.82 0 0 0 0 28.05 3 0 7 0 34.67 [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.