On Mon, Apr 1, 2013 at 12:42 PM, Gabor Grothendieck <ggrothendi...@gmail.com > wrote:
> On Mon, Apr 1, 2013 at 1:20 PM, Paul Johnson <pauljoh...@gmail.com> wrote: > > Here's my little discussion example for a quadratic regression: > > > > http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R > > > > Students press me to know the benefits of poly() over the more obvious > > regression formulas. > > > > I think I understand the theory on why poly() should be more numerically > > stable, but I'm having trouble writing down an example that proves the > > benefit of this. > > > > I am beginning to suspect that because R's lm uses a QR decomposition > under > > the hood, the instability that we used to see when we inverted (X'X) is > no > > longer so serious as it used to be. When the columns in X are > > > > x1 x2 x3 x3squared > > > > then the QR decomposition is doing about the same thing as we would do if > > we replaced x3 and x3squared by the poly(x3, 2) results. > > > > Or not. I'm not arguing that, I'm thinking out loud, hoping you'll > correct > > me. > > > > # 1. One benefit is if you fit a higher degree polynomial using > poly(x, n) the lower degree coefficients will agree with the fit using > a lower n. > # For example, compare these two: > set.seed(123) > y <- rnorm(10); x <- 1:10 > lm(y ~ poly(x, 2)) > lm(y ~ poly(x, 3)) > > # however, that is not true if you don't use orthogonal polynomials. > Compare these two: > lm(y ~ poly(x, 2, raw = TRUE)) > lm(y ~ poly(x, 3, raw = TRUE)) > > Thanks, Gabor: As usual, you are very helpful. I can't thank you enough. I'm pasting your comments to the end of that R example file I mentioned before, changing the examples to fit with the variable names I use there. I think your reason #2 is the prize winner, I can't see anything wrong with that one. Reason #1 is not quite as strong, only holds strictly if the poly() is the only predictor. Users usually have other variables in the model. If you--or anybody else--has other ideas about this, I am delighted to hear your ideas. I still hold out hope that the "numerical stability" argument will find some traction. pj > # 2. A second advantage is that the t statistics are invariant under > shift if you use orthogonal polynomials > # compare these two: > summary(lm(y ~ poly(x, 2))) > summary(lm(y ~ poly(x+1, 2))) > > # however, that is not true if you don;t use orthogonal polynomials, > Compare these two: > summary(lm(y ~ poly(x, 2, raw = TRUE))) > summary(lm(y ~ poly(x+1, 2, raw = TRUE))) > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.