Re: [R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Simon Wood Wed, 23 May 2012 02:30:06 -0700

Having looked at this further, I've made some changes in mgcv_1.7-17 tothe p-value computations for terms that can be penalized to zero duringfitting (e.g. s(x,bs="re"), s(x,m=1) etc).

The Wald statistic based p-values from summary.gam and anova.gam (i.e.what you get from e.g. anova(a) where a is a fitted gam object) arequite well founded for smooth terms that are non-zero under fullpenalization (e.g. a cubic spline is a straight line under fullpenalization). For such smooths, an extension of Nychka's (1988) resulton CI's for splines gives a well founded distributional result on whichto base a Wald statistic. However, the Nychka result requires thesmoothing bias to be substantially less than the smoothing estimatorvariance, and this will often not be the case if smoothing can actuallypenalize a term to zero (to understand why, see argument in appendix ofMarra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).

Simulation testing shows that this theoretical concern has seriouspractical consequences. So for terms that can be penalized to zero,alternative approximations have to be used, and these are nowimplemented in mgcv_1.7-17 (see ?summary.gam).

The approximate test performed by anova(a,b) (a and b are fitted "gam"objects) is less well founded. It is a reasonable approximation wheneach smooth term in the models could in principle be well approximatedby an unpenalized term of rank approximately equal to the edf of thesmooth term, but otherwise the p-values produced are likely to be muchtoo small. In particular simulation testing suggests that the test isnot to be trusted with s(...,bs="re") terms, and can be poor if themodels being compared involve any terms that can be penalized to zeroduring fitting. (Although the mechanisms are a little different, this issimilar to the problem we would have if the models were viewed asregular mixed models and we tried to use a GLRT to test variancecomponents for equality to zero).


These issues are now documented in ?anova.gam and ?summary.gam...

Simon

On 08/05/12 15:01, Martijn Wieling wrote:

Dear useRs,

I am using mgcv version 1.7-16. When I create a model with a few
non-linear terms and a random intercept for (in my case) country using
s(Country,bs="re"), the representative line in my model (i.e.
approximate significance of smooth terms) for the random intercept
reads:
                         edf       Ref.df     F          p-value
s(Country)       36.127 58.551   0.644    0.982

Can I interpret this as there being no support for a random intercept
for country? However, when I compare the simpler model to the model
including the random intercept, the latter appears to be a significant
improvement.

anova(gam1,gam2,test="F")

Model 1: ....
Model 2: .... + s(BirthNation, bs="re")
   Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
1    789.44     416.54
2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***

I hope somebody could help me in how I should proceed in these
situations. Do I include the random intercept or not?

I also have a related question. When I used to create a mixed-effects
regression model using lmer and included e.g., an interaction in the
fixed-effects structure, I would test if the inclusion of this
interaction was warranted using anova(lmer1,lmer2). It then would show
me that I invested 1 additional df and the resulting (possibly
significant) improvement in fit of my model.

This approach does not seem to work when using gam. In this case an
apparent investment of 1 degree of freedom for the interaction, might
result in an actual decrease of the degrees of freedom invested by the
total model (caused by a decrease of the edf's of splines in the model
with the interaction). In this case, how would I proceed in
determining if the model including the interaction term is better?

With kind regards,
Martijn Wieling

--
*******************************************
Martijn Wieling
http://www.martijnwieling.nl
wiel...@gmail.com
+31(0)614108622
*******************************************
University of Groningen
http://www.rug.nl/staff/m.b.wieling

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Reply via email to