Re: [R] About the interaction A:B

Jeff Laake Fri, 05 Mar 2010 09:42:54 -0800

On 3/5/2010 9:19 AM, Frank E Harrell Jr wrote:

You neglected to state your name and affiliation, and your questiondemonstrates an allergy to R documentation.
Frank

I agree with Frank but will try to answer some of your questions as Iunderstand it.

First, model.matrix uses the options$contrasts or what is setspecifically as the contrast for a factor using contrasts(). Thedefault istreatment contrasts and for a factor that means the first level of afactor variable is the intercept and the remainder are "treatment" effectswhich are added to intercept. If you specify Y~A+B+A:B, you arespecifying the model with main effects (A, B) and interactions (A:B).If A has m levels and B has n levels then there will be an intercept(1), m-1 for A, n-1 for B and (m-1)*(n-1) for the interaction.If you specify the model as Y~A:B it will specify the model fully withinteractions which will have m*n separate parameters and none will beNA as long as you have data in each of the m*n cells. It makesabsolutely no sense to me to have an intercept and then all but one of the

remaining interactions included.

Note that you'll get quite different results if A and/or B are notfactor variables.


--jeff


blue sky wrote:

The following is the code for the model.matrix. But it still doesn't
answer why A:B is interpreted differently in Y~A+B+A:B and Y~A:B. By
'why', I mean how R internally does it and what is the rational behind
the way of doing it?

And it didn't answer why in the model.matrix of Y~A, there are a-1
terms from A plus the intercept, but in the model.matrix of Y~A:B,
there are a*b terms (rather than a*b-1 terms) plus the intercept.
Since the one coefficient of the lm of Y~A:B is going to be NA, why
bother to include the corresponding term in the model matrix?

############code below

set.seed(0)

a=3
b=4

AB_effect=data.frame(
  name=paste(
    unlist(
      do.call(
        rbind
        , rep(list(paste('A', letters[1:a],sep='')), b)
        )
      )
    , unlist(
      do.call(
        cbind
        , rep(list(paste('B', letters[1:b],sep='')), a)
        )
      )
    , sep=':'
    )
  , value=rnorm(a*b)
  , stringsAsFactors=F
  )

max_n=10
n=sample.int(max_n, a*b, replace=T)

AB=mapply(function(name, n){rep(name,n)}, AB_effect$name, n)

Y=AB_effect$value[match(unlist(AB), AB_effect$name)]

Y=Y+a*b*rnorm(length(Y))

sub_fr=as.data.frame(do.call(rbind, strsplit(unlist(AB), ':')))
rownames(sub_fr)=NULL
colnames(sub_fr)=c('A', 'B')

fr=data.frame(Y=Y,sub_fr)

my_subset=function(amm) {
  coding=apply(
    amm
    ,1
    , function(x) {
      paste(x, collapse='')
    }
    )
  amm[match(unique(coding), coding),]
}

my_subset(model.matrix(Y ~ A*B,fr))
my_subset(model.matrix(Y ~ (A+B)^2,fr))
my_subset(model.matrix(Y ~ A + B + A:B,fr))
my_subset(model.matrix(Y ~ A:B - 1,fr))
my_subset(model.matrix(Y ~ A:B,fr))

On Fri, Mar 5, 2010 at 8:45 AM, Gabor Grothendieck
<ggrothendi...@gmail.com> wrote:

The way to understand this is to look at the output of model.matrix:

model.matrix(fo, fr)

for each formula you tried.  If your data is large you will have to
use a subset not to be overwhelmed with output.

On Fri, Mar 5, 2010 at 9:08 AM, blue sky <bluesky...@gmail.com> wrote:

Suppose, 'fr' is data.frame with columns 'Y', 'A' and 'B'. 'A' haslevels 'Aa'

'Ab' and 'Ac', and 'B' has levels 'Ba', 'Bb', 'Bc' and 'Bd'. 'Y'
columns are numbers.

I tried the following three sets of commands. I understand that A*B is
equivalent to A+B+A:B. However, A:B in A+B+A:B is different from A:B
just by itself (see the 3rd and 4th set of commands). Would you please
help me understand why the meanings of A:B are different in different
contexts?

I also see the coefficient of AAc:BBd is NA (the last set of
commands). I'm wondering why this coefficient is not removed from the
'coefficients' vector. Since lm(Y~A) has coefficients for (intercept),
Ab, Ac (there are no NA's), I think that it is reasonable to make sure
that there are no NA's when there are interaction terms, namely, A:B
in this case.

Thank you for answering my questions!

alm=lm(Y ~ A*B,fr)
alm$coefficients

(Intercept) AAb AAc BBb BBcBBd

 -3.548176   -2.086586    7.003857    4.367800   11.887356   -3.470840
  AAb:BBb     AAc:BBb     AAb:BBc     AAc:BBc     AAb:BBd     AAc:BBd
 5.160865  -11.858425  -12.853116  -20.289611    6.727401   -2.327173

alm=lm(Y ~ A + B + A:B,fr)
alm$coefficients

(Intercept) AAb AAc BBb BBcBBd

 -3.548176   -2.086586    7.003857    4.367800   11.887356   -3.470840
  AAb:BBb     AAc:BBb     AAb:BBc     AAc:BBc     AAb:BBd     AAc:BBd
 5.160865  -11.858425  -12.853116  -20.289611    6.727401   -2.327173

alm=lm(Y ~ A:B - 1,fr)
alm$coefficients

AAa:BBa AAb:BBa AAc:BBa AAa:BBb AAb:BBb AAc:BBbAAa:BBc-3.5481765 -5.6347625 3.4556808 0.8196231 3.8939016 -4.03494498.3391795

 AAb:BBc    AAc:BBc    AAa:BBd    AAb:BBd    AAc:BBd
-6.6005222 -4.9465744 -7.0190168 -2.3782017 -2.3423322

alm=lm(Y ~ A:B,fr)
alm$coefficients

(Intercept) AAa:BBa AAb:BBa AAc:BBa AAa:BBbAAb:BBb-2.34233221 -1.20584424 -3.29243033 5.79801305 3.161955346.23623377

  AAc:BBb     AAa:BBc     AAb:BBc     AAc:BBc     AAa:BBd     AAb:BBd

-1.69261273 10.68151168 -4.25819000 -2.60424217 -4.67668454-0.03586951

  AAc:BBd
       NA


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About the interaction A:B

Reply via email to