Hello,

I'm with a conceptual doubt regarding Rsquared of both lm() and
postResample(library caret).

I've got a multiple regression linear model (lets say mlr) with anR² value
of 67.52%.
Then I use this model pro make predictions with predict() function using the
same data as input , that is, use the generated model to predict the value
associated with data that I used as input.

Next, if I apply postResample() to the observed and predicted data, why do I
have have an R² value of 33%? I mean, wasn't it supposed to be, at least,
67%, as in the original model, since they're using the same data as input?

Here is the code (the data goes on the end of the email)

#read input data
input<-read.table("input.csv", header=T)

# multiple linear regression
mlr<-lm(input$TOTAL~-1 + input$A + input$B + input$C + input$D)

#observe the model
summary(mlr)
Call:
lm(formula = input$TOTAL ~ -1 + input$A + input$B + input$C +  input$D)

Residuals:
    Min      1Q  Median      3Q     Max
-25.753  -7.455   2.396  12.615  55.316

Coefficients:
        Estimate Std. Error t value Pr(>|t|)
input$A  10.5985     3.9782   2.664   0.0121 *
input$B   0.3471    17.7731   0.020   0.9845
input$C   0.9468     1.9442   0.487   0.6297
input$D  12.1056     4.7262   2.561   0.0155 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 17.08 on 31 degrees of freedom
Multiple R-Squared: 0.6752,     Adjusted R-squared: 0.6333
F-statistic: 16.11 on 4 and 31 DF,  p-value: 3.090e-07

#as we noticed, an Rsquared value of 67.52%
#next, lets predict the results with the same input data
 prediction<-predict(mlr,input)


#now let's evaluate the predictions, observing the R² and RMSE values that
postResample returns

 postResample(input$TOTAL, prediction)
      RMSE   Rsquared
16.0718506  0.3300378

So here comes my doubt: why do I have an value of 67.52% for R² when
creating the model(that is , the model explains 67.52% of the data) and
when I use this same model on the same input data, why does postResample
return a very different value associated to R²?

Best regards,

Giovane

#input.csv file used as input

"A"     "B"     "C"     "D"     "TOTAL"
1       0       1       0       3.8
1       0       1       0       21.67
1       0       0       0       2.92
2       0       6       0       42.84
0       0       0       0       5.28
2       0       0       3       44.86
1       0       0       0       8.22
1       0       0       0       28.24
1       0       3       0       29.69
1       0       0       1       78.02
3       0       7       0       51.29
2       0       0       0       37.55
2       0       2       0       10.82
1       0       3       0       17.67
0       0       0       0       6.62
2       1       3       1       36.49
0       0       0       0       37.52
1       0       2       0       5.26
1       0       2       0       7.32
1       0       0       0       2.2
2       0       6       0       39.24
0       0       0       0       2.83
2       0       0       3       50.93
1       0       0       0       4.15
1       0       0       0       29.72
1       0       3       0       4.26
1       0       0       1       25.1
3       0       7       0       12.67
2       0       0       0       7.99
2       0       2       0       17.55
1       0       3       0       3.66
0       0       0       0       7.22
0       0       0       0       3.82
0       0       0       0       28.05
3       0       7       0       34.67

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to