I am puzzled at the use of regression.  I have a categorical variable
ClassePop33000 which factors a Population variable into 3 levels.  I want to
investigate whether that categorical variable has some relation with my
dependent variable, so I go :

lm(Cout.ton ~ ClassePop33000, data=ech2)

Call:
lm(formula = Cout.ton ~ ClassePop33000, data = ech2)

Residuals:
    Min      1Q  Median      3Q     Max 
-182.24  -62.91  -22.76   66.38  277.39 

Coefficients:
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                          231.66      11.50  20.141  < 2e-16 ***
ClassePop33000[T.[3000,25000)]       -72.91      16.70  -4.366 2.19e-05 ***
ClassePop33000[T.[25000,10000000)]   -95.17      19.92  -4.777 3.82e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 97.6 on 170 degrees of freedom
Multiple R-Squared: 0.1502,     Adjusted R-squared: 0.1402 
F-statistic: 15.02 on 2 and 170 DF,  p-value: 9.818e-07 


Now I discovered one could omit the intercept and therefore have
coefficients for the N levels of the categorical variable.  So I went :

lm(Cout.ton ~ ClassePop33000 + 0, data=ech2)

Call:
lm(formula = Cout.ton ~ ClassePop33000 + 0, data = ech2)

Residuals:
    Min      1Q  Median      3Q     Max 
-182.24  -62.91  -22.76   66.38  277.39 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
ClassePop33000[1,3000)           231.66      11.50  20.141  < 2e-16 ***
ClassePop33000[3000,25000)       158.75      12.11  13.114  < 2e-16 ***
ClassePop33000[25000,10000000)   136.49      16.27   8.391  1.8e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 97.6 on 170 degrees of freedom
Multiple R-Squared: 0.7922,     Adjusted R-squared: 0.7885 
F-statistic:   216 on 3 and 170 DF,  p-value: < 2.2e-16 


I tried the very pedagogical examples at
http://www.stat.umn.edu/geyer/5102/examp/dummy.html and plotting the
regression lines with abline gives me the exact same lines whether I use
with or without intercept.  Now why do R squared differ then ?  At least the
p-values are of the same order of magnitude, but I don't understand the
drastic difference in R squared.  Pointers to stats 101 anyone ?  

TIA
-- 
View this message in context: 
http://www.nabble.com/Stats-101-%3A-lm-with-without-intercept-tf4498491.html#a12829558
Sent from the R help mailing list archive at Nabble.com.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to