No, this is a cute problem, though: the definition of R^2 changes without the intercept, because the "empty" model used for calculating the total sums of squares is always predicting 0 (so the total sums of squares are sums of squares of the observations themselves, without centering around the sample mean).
Your interpretation of the p-value for the intercept in the first model is also backwards: 0.9535 is extremely weak evidence against the hypothesis that the intercept is 0. That is, the intercept might be near zero, but could also be something veru different. With a standard error of 229, your 95% confidence interval for the intercept (if you trusted it based on other things) would have a margin of error of well over 400. If you told me that an intercept of, say 350 or 400 were consistent with your knowledge of the problem, I wouldn't blink. This is a very small data set: if you sent an R command such as: x <- c(x1, x2, ..., xn) y <- c(y1, y2, ..., yn) you might even get some more interesting feedback. One of the many good intro stats textbooks might also be helpful as you get up to speed. Jay --------------------------------------------- Original post: Message: 135 Date: Fri, 18 Feb 2011 11:49:41 +0100 From: Jan <jrheinlaen...@gmx.de> To: "R-help@r-project.org list" <r-help@r-project.org> Subject: [R] lm without intercept Message-ID: <1298026181.2847.19.camel@jan-laptop> Content-Type: text/plain; charset="UTF-8" Hi, I am not a statistics expert, so I have this question. A linear model gives me the following summary: Call: lm(formula = N ~ N_alt) Residuals: Min 1Q Median 3Q Max -110.30 -35.80 -22.77 38.07 122.76 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.5177 229.0764 0.059 0.9535 N_alt 0.2832 0.1501 1.886 0.0739 . --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 56.77 on 20 degrees of freedom (16 observations deleted due to missingness) Multiple R-squared: 0.151, Adjusted R-squared: 0.1086 F-statistic: 3.558 on 1 and 20 DF, p-value: 0.07386 The regression is not very good (high p-value, low R-squared). The Pr value for the intercept seems to indicate that it is zero with a very high probability (95.35%). So I repeat the regression forcing the intercept to zero: Call: lm(formula = N ~ N_alt - 1) Residuals: Min 1Q Median 3Q Max -110.11 -36.35 -22.13 38.59 123.23 Coefficients: Estimate Std. Error t value Pr(>|t|) N_alt 0.292046 0.007742 37.72 <2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 55.41 on 21 degrees of freedom (16 observations deleted due to missingness) Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848 F-statistic: 1423 on 1 and 21 DF, p-value: < 2.2e-16 1. Is my interpretation correct? 2. Is it possible that just by forcing the intercept to become zero, a bad regression becomes an extremely good one? 3. Why doesn't lm suggest a value of zero (or near zero) by itself if the regression is so much better with it? Please excuse my ignorance. Jan Rheinl?nder -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.