On 21/09/2007 4:47 PM, Yves Moisan wrote: > I am puzzled at the use of regression. I have a categorical variable > ClassePop33000 which factors a Population variable into 3 levels. I want to > investigate whether that categorical variable has some relation with my > dependent variable, so I go : > > lm(Cout.ton ~ ClassePop33000, data=ech2) > > Call: > lm(formula = Cout.ton ~ ClassePop33000, data = ech2) > > Residuals: > Min 1Q Median 3Q Max > -182.24 -62.91 -22.76 66.38 277.39 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 231.66 11.50 20.141 < 2e-16 *** > ClassePop33000[T.[3000,25000)] -72.91 16.70 -4.366 2.19e-05 *** > ClassePop33000[T.[25000,10000000)] -95.17 19.92 -4.777 3.82e-06 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 97.6 on 170 degrees of freedom > Multiple R-Squared: 0.1502, Adjusted R-squared: 0.1402 > F-statistic: 15.02 on 2 and 170 DF, p-value: 9.818e-07 > > > Now I discovered one could omit the intercept and therefore have > coefficients for the N levels of the categorical variable. So I went : > > lm(Cout.ton ~ ClassePop33000 + 0, data=ech2) > > Call: > lm(formula = Cout.ton ~ ClassePop33000 + 0, data = ech2) > > Residuals: > Min 1Q Median 3Q Max > -182.24 -62.91 -22.76 66.38 277.39 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > ClassePop33000[1,3000) 231.66 11.50 20.141 < 2e-16 *** > ClassePop33000[3000,25000) 158.75 12.11 13.114 < 2e-16 *** > ClassePop33000[25000,10000000) 136.49 16.27 8.391 1.8e-14 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 97.6 on 170 degrees of freedom > Multiple R-Squared: 0.7922, Adjusted R-squared: 0.7885 > F-statistic: 216 on 3 and 170 DF, p-value: < 2.2e-16 > > > I tried the very pedagogical examples at > http://www.stat.umn.edu/geyer/5102/examp/dummy.html and plotting the > regression lines with abline gives me the exact same lines whether I use > with or without intercept. Now why do R squared differ then ? At least the > p-values are of the same order of magnitude, but I don't understand the > drastic difference in R squared. Pointers to stats 101 anyone ?
The standard definition of R-squared assumes there's an intercept present. If you suppress it, you need to come up with a new definition. So those values aren't comparable. Duncan Murdoch ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.