I think that the main problem here is that smooths are not constrained to pass through the origin, so the covariate taking the value zero doesn't correspond to no effect in the way that you would like it to. Another way of putting this is that smooths are translation invariant, you get essentially the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) + e_i (which implies that x_i=0 can have no special status).

All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of te(a,b) and te(a,d). Further constraining te(a,b) and te(a,d) so that te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not even be in the range of a and b).

In general I find functional ANOVA not entirely intuitive to think about, but there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, Springer), and the associated package gss is on CRAN.

best,
Simon



On 07/06/11 17:00, Ben Haller wrote:
Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as
recommended to me earlier by some folks on this list.  I've run into
a question to which I can't find the answer in his book, so I'm
hoping somebody here knows.

My outcome variable is binary, so I'm doing a binomial fit with
gam().  I have five independent variables, all continuous, all
uniformly distributed in [0, 1].  (This dataset is the result of a
simulation model.)  Let's call them a,b,c,d,e for simplicity.  I'm
interested in interactions such as a*b, so I'm using tensor product
smooths such as te(a,b).  So far so good.  But I'm also interested
in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of these
have a as a marginal basis (if I'm using the right terminology; all I
mean is, both interactions involve a), and I would have expected them
to share that basis; I would have expected them to be constrained
such that the effect of a when b=0, for one, would be the same as the
effect of a when d=0, for the other.  This would be just as, in a GLM
with formula a*b + a*d, that formula would expand to a + b + d + a:b
+ a:d, and there is only one "a"; a doesn't get to be different for
the a*b interaction than it is for the! a*d interaction.  But with
tensor product smooths in gam(), that does not seem to be the case.
I'm still just getting to know mgcv and experimenting with things, so
I may be doing something wrong; but the plots I have done of fits of
this type appear to show different marginal effects.

I tried explicitly including terms for the marginal basis; in my
example, I tried a formula like te(a) + te(b) + te(d) + te(a, b) +
te(a, d).  No dice; in this case, the main effect of a is different
between all three places where it occurs in the model.  I.e. te(a)
shows a different effect of a than te(a, b) shows at b=0, which is
again different from the effect shown by te(a, d) at d=0.  I don't
even know what that could possibly mean; it seems wrong to me that
this could even be the case, but what do I know.  :->

I could move up to a higher-order tensor like te(a,b,d), but there
are three problems with that.  One, the b:d interaction (in my
simplified example) is then also part of the model, and I'm not
interested in it.  Two, given the set of interactions that I *am*
interested in, I would actually be forced to do the full five-way
te(a,b,c,d,e), and with a 300,000 row dataset, I shudder to think how
long that will take to run, since it would have something like 5^5
free parameters to fit; that doesn't seem worth pursuing.  And three,
interpretation of a five-way interaction would be unpleasant, to say
the least; I'd much rather be able to stay with just the two-way (and
one three-way) interactions that I know are of interest (I know this
from previous logistic regression modelling of the dataset).

For those who like to see the actual R code, here are two fits I've
tried:

gam(outcome ~ te(acl, dispersal) + te(amplitude, dispersal) +
te(slope, curvature, amplitude), family=binomial, data=rla,
method="REML")

gam(outcome ~ te(slope) + te(curvature) + te(amplitude) + te(acl) +
te(dispersal) + te(slope, curvature) + te(slope, amplitude) +
te(curvature, amplitude) + te(acl, dispersal) + te(amplitude,
dispersal) + te(slope, curvature, amplitude), family=binomial,
data=rla, method="REML")

So.  Any advice?  How can I correctly do a gam() fit involving
multiple interactions that involve the same independent variable?

Thanks!

Ben Haller McGill University

http://biology.mcgill.ca/grad/ben/

______________________________________________ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to