Hi Ben,

>From what you have written, I am not exactly sure what your
seat-of-the-pant sense is coming from.  My pantseat typically does not
tell me much; however, quartic trends tend to less stable than linear,
so I am not terribly surprised.

As two side notes:

x_qt <- x^4 # shorter code-wise
and you can certain just enter a quartic term if the data is just
quartic and you are not really itnerested in the lower trends.

Cheers,

Josh

On Fri, May 6, 2011 at 8:35 AM, Ben Haller <rh...@sticksoftware.com> wrote:
>  Hi all!  I'm getting a model fit from glm() (a binary logistic regression 
> fit, but I don't think that's important) for a formula that contains powers 
> of the explanatory variable up to fourth.  So the fit looks something like 
> this (typing into mail; the actual fit code is complicated because it 
> involves step-down and so forth):
>
> x_sq <- x * x
> x_cb <- x * x * x
> x_qt <- x * x * x * x
> model <- glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)
>
>  This works great, and I get a nice fit.  My question regards the confidence 
> intervals on that fit.  I am calling:
>
> cis <- confint.default(model, level=0.95)
>
> to get the confidence intervals for each coefficient.  This confint.default() 
> call gives me a table like this (where "dispersal" is the real name of what 
> I'm calling "x" above):
>
>                             2.5 %      97.5 %
> (Intercept)              8.7214297   8.9842533
> dispersal              -37.5621412 -35.6629981
> dispersal_sq            66.8077669  74.4490123
> dispersal_cb           -74.5963861 -62.8347057
> dispersal_qt            22.6590159  28.6506186
>
>  A side note: calling confint() instead of confint.default() is much, much 
> slower -- my model has many other terms I'm not discussing here, and is fit 
> to 300,000 data points -- and a check indicates that the intervals returned 
> for my model by confint() are not virtually identical, so this is not the 
> source of my problem:
>
>                             2.5 %      97.5 %
> (Intercept)              8.7216693   8.9844958
> dispersal              -37.5643922 -35.6652276
> dispersal_sq            66.8121519  74.4534753
> dispersal_cb           -74.5995820 -62.8377766
> dispersal_qt            22.6592724  28.6509494
>
>  These tables show the problem: the 95% confidence limits for the quartic 
> term are every bit as wide as the limits for the other terms.  Since the 
> quartic term coefficient gets multiplied by the fourth power of x, this means 
> that the width of the confidence band starts out nice and narrow (when x is 
> close to zero, where the width of the confidence band is pretty much just 
> determined by the confidence limits on the intercept) but balloons out to be 
> tremendously wide for larger values of x.  The width of it looks quite 
> unreasonable to me, given that my fit is constrained by 300,000 data points.
>
>  Intuitively, I would have expected the confidence limits for the quartic 
> term to be much narrower than for, say, the linear term, because the effect 
> of variation in the quartic term is so much larger.  But the way confidence 
> intervals are calculated in R, at least, does not seem to follow my instincts 
> here.  In other words, predicted values using the 2.5% value of the linear 
> coefficient and the 97.5% value of the linear coefficient are really not very 
> different, but predicted values using the 2.5% value of the quadratic 
> coefficient and the 97.5% value of the quadratic coefficient are enormously, 
> wildly divergent -- far beyond the range of variability in the original data, 
> in fact.  I feel quite confident, in a seat-of-the-pants sense, that the 
> actual 95% limits of that quartic coefficient are much, much narrower than 
> what R is giving me.
>
>  Looking at it another way: if I just exclude the quartic term from the glm() 
> fit, I get a dramatically narrower confidence band, even though the fit to 
> the data is not nearly as good (I'm keeping the fourth power for good 
> reasons).  This again seems to me to indicate that the confidence band with 
> the quartic term included is artificially, and incorrectly, wide.  If a fit 
> with reasonably narrow confidence limits can be produced for the model with 
> terms only up to cubic (or, taking this reductio even further, for a 
> quadratic or a linear model, also), why does adding the quadratic term, which 
> improves the quality of the fit to the data, cause the confidence limits to 
> nevertheless balloon outwards?  That seems fundamentally illogical to me, but 
> maybe my instincts are bad.
>
>  So what's going on here?  Is this just a limitation of the way confint() 
> computes confidence intervals?  Is there a better function to use, that is 
> compatible with binomial glm() fits?  Or am I misinterpreting the meaning of 
> the confidence limits in some way?  Or what?
>
>  Thanks for any advice.  I've had trouble finding examples of polynomial fits 
> like this; people sometimes fit the square of the explanatory variable, of 
> course, but going up to fourth powers seems to be quite uncommon, so I've had 
> trouble finding out how others have dealt with these sorts of issues that 
> crop up only with higher powers.
>
> Ben Haller
> McGill University
>
> http://biology.mcgill.ca/grad/ben/
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to