On Mon, Apr 1, 2013 at 12:42 PM, Gabor Grothendieck <ggrothendi...@gmail.com
> wrote:

> On Mon, Apr 1, 2013 at 1:20 PM, Paul Johnson <pauljoh...@gmail.com> wrote:
> > Here's my little discussion example for a quadratic regression:
> >
> > http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R
> >
> > Students press me to know the benefits of poly() over the more obvious
> > regression formulas.
> >
> > I think I understand the theory on why poly() should be more numerically
> > stable, but I'm having trouble writing down an example that proves the
> > benefit of this.
> >
> > I am beginning to suspect that because R's lm uses a QR decomposition
> under
> > the hood, the instability that we used to see when we inverted (X'X) is
> no
> > longer so serious as it used to be. When the columns in X are
> >
> > x1 x2 x3 x3squared
> >
> > then the QR decomposition is doing about the same thing as we would do if
> > we replaced x3 and x3squared by the poly(x3, 2) results.
> >
> > Or not. I'm not arguing that, I'm thinking out loud, hoping you'll
> correct
> > me.
> >
>
> # 1. One benefit is if you fit a higher degree polynomial using
> poly(x, n) the lower degree coefficients will agree with the fit using
> a lower n.
> # For example, compare these two:
> set.seed(123)
> y <- rnorm(10); x <- 1:10
> lm(y ~ poly(x, 2))
> lm(y ~ poly(x, 3))
>
> # however, that is not true if you don't use orthogonal polynomials.
> Compare these two:
> lm(y ~ poly(x, 2, raw = TRUE))
> lm(y ~ poly(x, 3, raw = TRUE))
>
>
Thanks, Gabor:

As usual, you are very helpful. I can't thank you enough.

I'm pasting your comments to the end of that R example file I mentioned
before, changing the examples to fit with the variable names I use there.

I think your reason #2 is the prize winner, I can't see anything wrong with
that one.

Reason #1 is not quite as strong, only holds strictly if the poly() is the
only predictor.  Users usually have other variables in the model.

If you--or anybody else--has other ideas about this, I am delighted to hear
your ideas.  I still hold out hope that the "numerical stability" argument
will find some traction.

pj



> # 2. A second advantage is that the t statistics are invariant under
> shift if you use orthogonal polynomials
> # compare these two:
> summary(lm(y ~ poly(x, 2)))
> summary(lm(y ~ poly(x+1, 2)))
>
> # however, that is not true if you don;t use orthogonal polynomials,
> Compare these two:
> summary(lm(y ~ poly(x, 2, raw = TRUE)))
> summary(lm(y ~ poly(x+1, 2, raw = TRUE)))
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



-- 
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to