On Tue, May 8, 2012 at 3:45 PM, array chip <arrayprof...@yahoo.com> wrote: > Thanks again Peter. What about the argument that because low R square (e.g. > R^2=0.2) indicated the model variance was not sufficiently explained by the > factors in the model, there might be additional factors that should be > identified and included in the model. And If these additional factors were > indeed included, it might change the significance for the factor of interest > that previously showed significant coefficient. In other word, if R square is > low, the significant coefficient observed is not trustworthy. > > What's your opinion on this argument?
I think that argument is silly. I'm sorry if that is too blunt. Its just plain superficial. It reflects a poor understanding of what the linear model is all about. If you have other variables that might "belong" in the model, run them and test. The R-square, either low or high, does not have anything direct to say about whether those other variables exist. Here's my authority. Arthur Goldberger (A Course in Econometrics, 1991, p.177) “Nothing in the CR (Classical Regression) model requires that R2 be high. Hence, a high R2 is not evidence in favor of the model, and a low R2 is not evidence against it.” I found that reference in Anders Skrondal and Sophia Rabe-Hesketh, Generalized Latend Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004. >From Section 8.5.2: "Furthermore, how badly the baseline model fits the data depends greatly on the magnitude of the parameters of the true model. For instance, consider estimating a simple parallel measurement model. If the true model is a congeneric measurement model (with considerable variation in factor loadings and measurement error variances between items), the fit index could be high simply because the null model fits very poorly, i.e. because the reliabilities of the items are high. However, if the true model is a parallel measurement model with low reliabilities the fit index could be low although we are estimating the correct model. Similarly, estimating a simple linear regression model can yield a high R2 if the relationship is actually quadratic with a considerable linear trend and a low R2 when the model is true but with a small slope (relative to the overall variance)." For a detailed argument/explanation of the argument that the R-square is not a way to decide if a model is "good" or "bad" see King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science, 30(3), 666–687. doi:10.2307/2111095 pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.