Hi Ben, >From what you have written, I am not exactly sure what your seat-of-the-pant sense is coming from. My pantseat typically does not tell me much; however, quartic trends tend to less stable than linear, so I am not terribly surprised.
As two side notes: x_qt <- x^4 # shorter code-wise and you can certain just enter a quartic term if the data is just quartic and you are not really itnerested in the lower trends. Cheers, Josh On Fri, May 6, 2011 at 8:35 AM, Ben Haller <rh...@sticksoftware.com> wrote: > Hi all! I'm getting a model fit from glm() (a binary logistic regression > fit, but I don't think that's important) for a formula that contains powers > of the explanatory variable up to fourth. So the fit looks something like > this (typing into mail; the actual fit code is complicated because it > involves step-down and so forth): > > x_sq <- x * x > x_cb <- x * x * x > x_qt <- x * x * x * x > model <- glm(outcome ~ x + x_sq + x_cb + x_qt, binomial) > > This works great, and I get a nice fit. My question regards the confidence > intervals on that fit. I am calling: > > cis <- confint.default(model, level=0.95) > > to get the confidence intervals for each coefficient. This confint.default() > call gives me a table like this (where "dispersal" is the real name of what > I'm calling "x" above): > > 2.5 % 97.5 % > (Intercept) 8.7214297 8.9842533 > dispersal -37.5621412 -35.6629981 > dispersal_sq 66.8077669 74.4490123 > dispersal_cb -74.5963861 -62.8347057 > dispersal_qt 22.6590159 28.6506186 > > A side note: calling confint() instead of confint.default() is much, much > slower -- my model has many other terms I'm not discussing here, and is fit > to 300,000 data points -- and a check indicates that the intervals returned > for my model by confint() are not virtually identical, so this is not the > source of my problem: > > 2.5 % 97.5 % > (Intercept) 8.7216693 8.9844958 > dispersal -37.5643922 -35.6652276 > dispersal_sq 66.8121519 74.4534753 > dispersal_cb -74.5995820 -62.8377766 > dispersal_qt 22.6592724 28.6509494 > > These tables show the problem: the 95% confidence limits for the quartic > term are every bit as wide as the limits for the other terms. Since the > quartic term coefficient gets multiplied by the fourth power of x, this means > that the width of the confidence band starts out nice and narrow (when x is > close to zero, where the width of the confidence band is pretty much just > determined by the confidence limits on the intercept) but balloons out to be > tremendously wide for larger values of x. The width of it looks quite > unreasonable to me, given that my fit is constrained by 300,000 data points. > > Intuitively, I would have expected the confidence limits for the quartic > term to be much narrower than for, say, the linear term, because the effect > of variation in the quartic term is so much larger. But the way confidence > intervals are calculated in R, at least, does not seem to follow my instincts > here. In other words, predicted values using the 2.5% value of the linear > coefficient and the 97.5% value of the linear coefficient are really not very > different, but predicted values using the 2.5% value of the quadratic > coefficient and the 97.5% value of the quadratic coefficient are enormously, > wildly divergent -- far beyond the range of variability in the original data, > in fact. I feel quite confident, in a seat-of-the-pants sense, that the > actual 95% limits of that quartic coefficient are much, much narrower than > what R is giving me. > > Looking at it another way: if I just exclude the quartic term from the glm() > fit, I get a dramatically narrower confidence band, even though the fit to > the data is not nearly as good (I'm keeping the fourth power for good > reasons). This again seems to me to indicate that the confidence band with > the quartic term included is artificially, and incorrectly, wide. If a fit > with reasonably narrow confidence limits can be produced for the model with > terms only up to cubic (or, taking this reductio even further, for a > quadratic or a linear model, also), why does adding the quadratic term, which > improves the quality of the fit to the data, cause the confidence limits to > nevertheless balloon outwards? That seems fundamentally illogical to me, but > maybe my instincts are bad. > > So what's going on here? Is this just a limitation of the way confint() > computes confidence intervals? Is there a better function to use, that is > compatible with binomial glm() fits? Or am I misinterpreting the meaning of > the confidence limits in some way? Or what? > > Thanks for any advice. I've had trouble finding examples of polynomial fits > like this; people sometimes fit the square of the explanatory variable, of > course, but going up to fourth powers seems to be quite uncommon, so I've had > trouble finding out how others have dealt with these sorts of issues that > crop up only with higher powers. > > Ben Haller > McGill University > > http://biology.mcgill.ca/grad/ben/ > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.