Re: [R] random effects in library mgcv

Simon Wood Thu, 26 Apr 2012 10:29:07 -0700

On 25/04/12 14:02, Mabille, Geraldine wrote:

Hi, I am working with gam models in the mgcv library. My response
variable (Y) is binary (0/1), and my dataset contains repeated
measures over 110 individuals (same number of 0/1 within a given
individual: e.g. 345-zero and 345-one for individual A, 226-zero and
226-one for individual B, etc.). The variable Factor is separating
the individuals in three groups according to mass (group 0,1,2),
Factor1 is a binary variable coding for individuals of group1,
Factor2 is a binary variable for individuals of group 2 I use gam
models of this sort with random effects coded using a s( ...,
bs="re") term: gm<-gam(Y~Factor+te(x1,x2,by=Factor)
)+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML")

gm1<-gam(Y~Factor+te(x1,x2)+ te(x1,x2,by=Factor1)+
te(x1,x2,by=Factor2)+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML")


1)First question: is it OK to use gam() to model a binary variable
with random effects coded as a "bs="re" term"?? I have read that the
gamm4() function gives better performance than gamm() to deal with
binary variables when random effects are coded as:
random=~(1|Individual) but does that mean that binary variables
should not be used as response variable in gam() with random effects
coded as bs="re"???

-- It's fine to use 'gam'. The problem with gamm, is that it uses PQL,
which can perform poorly in the binary case (mostly related to

estimating the scale parameter rather than treating it as fixed). Bothgam and gamm4 use a Laplace approximation that behaves much better thatPQL in the binary case.

2)Second question: For some models, I obtain a p-value=NA and
Chi-square=0 for the s(Individual) term, and for some other models a
p-value=1 and high Chi-square. The difference between one model that
can estimate a p-value and one that cannot is very slight: for
example if I use a variable x3 instead of x2 in a model, it can
change from p-value=NA to p-value=1. Does anyone know what can be
happening?

-- basically both cases are providing the same information --- that thecorresponding term is not need in the model. In the first case the termis estimated to be so close to zero that it makes no sense to compute ap-value, while in the second case the term has been estimated to benon-zero, but not *significantly* different from zero at any level.

3)Third question: Not linked to random effects but rather to what
the two models gm and gm1 are actually testing. From my
understanding, the first model creates a 2d-smooth for each level of
my factor variable and test whether those smooth are significantly
different from a straight line.

-- from zero, if I understand the model right.

The second model, also creates 3
smooth: one for the reference level of my Factor variable (group0),
one showing the difference between the reference smooth and the
smooth for group1, one showing the difference between the reference
smooth and the smooth for group 2. The summary(gm1) gives p-values
associated with each of those three smooths and which determine:  if
the reference smooth is different from 0, if the smooth for group1
is different from the reference smooth and  if the smooth for group2
is different from the reference smooth. Do I understand well what
the models are testing?

-- sounds right to me.

The number of "edf" estimated for
te(x1,x2):Factor2 in the gm1 model is 3,013 while it is 19,57 in the
gm model. Does that mean that the difference between the reference
smooth: te(x1,x2) and the smooth for group 2: te(x1,x2, by=Factor2)
is "small" so it can be modeled with only 3 degrees of freedom?

-- "smooth", rather than "small". The difference could be very large,but vary smoothly with x1, x2.

Still, the associated p-value is highly significant?

... since the terms is significantly different from zero.

When comparing
AIC between the gm and gm1 models, I find sometimes that gm1 has a
lower AIC than gm.  How can that be interpreted??

-- this is quite common if the differences between the factor levels canbe represented rather smoothly, but the baseline smooth is quitecomplicated. In that case gm1 is much more parsimonious than gm for(approximately) the same fit, since gm requires a complicated smooth foreach level (duplicating effort, in effect). So gm1 has a lower AICbecause its doing the same job with fewer coefficients.


best,
Simon



--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] random effects in library mgcv

Reply via email to