On Fri, Mar 5, 2010 at 11:41 AM, Jeff Laake <jeff.la...@noaa.gov> wrote: > On 3/5/2010 9:19 AM, Frank E Harrell Jr wrote: >> >> You neglected to state your name and affiliation, and your question >> demonstrates an allergy to R documentation. >> >> Frank > > I agree with Frank but will try to answer some of your questions as I > understand it. > > First, model.matrix uses the options$contrasts or what is set specifically > as the contrast for a factor using contrasts(). The default is > treatment contrasts and for a factor that means the first level of a factor > variable is the intercept and the remainder are "treatment" effects > which are added to intercept. If you specify Y~A+B+A:B, you are specifying > the model with main effects (A, B) and interactions (A:B).
So my interpretation of how R does internally when dealing with interaction terms A:B is to look if each individual term (each sub interactions in highway interactions like, A:B, A:C, etc., in A:B:C:D) appears in the formula or not, and deciding how to construct the columns of the model matrix according to A:B, right? This seem quite complicated when a formula is arbitrarily complex, let along different type of data (namely ordered, not ordered, numerical numbers). The piece of information that I feel that is still missing to me is the detailed procedure that R generate model.matrix for arbitrary complex formulas and how it is proved the way that R does is always correct for these complex formulas. > If A has m levels and B has n levels then there will be an intercept (1), > m-1 for A, n-1 for B and (m-1)*(n-1) for the interaction. > If you specify the model as Y~A:B it will specify the model fully with > interactions which will have m*n separate parameters and none will be > NA as long as you have data in each of the m*n cells. It makes absolutely > no sense to me to have an intercept and then all but one of the > remaining interactions included. I think that 'Y ~ A:B' indeed has the intercept term unless '-1' is included. You may need to adjust your interpretation of A:B and A:B -1 in these cases. > dim(my_subset(model.matrix(Y ~ A:B - 1,fr))) [1] 12 12 > dim(my_subset(model.matrix(Y ~ A:B,fr))) [1] 12 13 > Note that you'll get quite different results if A and/or B are not factor > variables. > > --jeff >> >> blue sky wrote: >>> >>> The following is the code for the model.matrix. But it still doesn't >>> answer why A:B is interpreted differently in Y~A+B+A:B and Y~A:B. By >>> 'why', I mean how R internally does it and what is the rational behind >>> the way of doing it? >>> >>> And it didn't answer why in the model.matrix of Y~A, there are a-1 >>> terms from A plus the intercept, but in the model.matrix of Y~A:B, >>> there are a*b terms (rather than a*b-1 terms) plus the intercept. >>> Since the one coefficient of the lm of Y~A:B is going to be NA, why >>> bother to include the corresponding term in the model matrix? >>> >>> ############code below >>> >>> set.seed(0) >>> >>> a=3 >>> b=4 >>> >>> AB_effect=data.frame( >>> name=paste( >>> unlist( >>> do.call( >>> rbind >>> , rep(list(paste('A', letters[1:a],sep='')), b) >>> ) >>> ) >>> , unlist( >>> do.call( >>> cbind >>> , rep(list(paste('B', letters[1:b],sep='')), a) >>> ) >>> ) >>> , sep=':' >>> ) >>> , value=rnorm(a*b) >>> , stringsAsFactors=F >>> ) >>> >>> max_n=10 >>> n=sample.int(max_n, a*b, replace=T) >>> >>> AB=mapply(function(name, n){rep(name,n)}, AB_effect$name, n) >>> >>> Y=AB_effect$value[match(unlist(AB), AB_effect$name)] >>> >>> Y=Y+a*b*rnorm(length(Y)) >>> >>> sub_fr=as.data.frame(do.call(rbind, strsplit(unlist(AB), ':'))) >>> rownames(sub_fr)=NULL >>> colnames(sub_fr)=c('A', 'B') >>> >>> fr=data.frame(Y=Y,sub_fr) >>> >>> my_subset=function(amm) { >>> coding=apply( >>> amm >>> ,1 >>> , function(x) { >>> paste(x, collapse='') >>> } >>> ) >>> amm[match(unique(coding), coding),] >>> } >>> >>> my_subset(model.matrix(Y ~ A*B,fr)) >>> my_subset(model.matrix(Y ~ (A+B)^2,fr)) >>> my_subset(model.matrix(Y ~ A + B + A:B,fr)) >>> my_subset(model.matrix(Y ~ A:B - 1,fr)) >>> my_subset(model.matrix(Y ~ A:B,fr)) >>> >>> On Fri, Mar 5, 2010 at 8:45 AM, Gabor Grothendieck >>> <ggrothendi...@gmail.com> wrote: >>>> >>>> The way to understand this is to look at the output of model.matrix: >>>> >>>> model.matrix(fo, fr) >>>> >>>> for each formula you tried. If your data is large you will have to >>>> use a subset not to be overwhelmed with output. >>>> >>>> On Fri, Mar 5, 2010 at 9:08 AM, blue sky <bluesky...@gmail.com> wrote: >>>>> >>>>> Suppose, 'fr' is data.frame with columns 'Y', 'A' and 'B'. 'A' has >>>>> levels 'Aa' >>>>> 'Ab' and 'Ac', and 'B' has levels 'Ba', 'Bb', 'Bc' and 'Bd'. 'Y' >>>>> columns are numbers. >>>>> >>>>> I tried the following three sets of commands. I understand that A*B is >>>>> equivalent to A+B+A:B. However, A:B in A+B+A:B is different from A:B >>>>> just by itself (see the 3rd and 4th set of commands). Would you please >>>>> help me understand why the meanings of A:B are different in different >>>>> contexts? >>>>> >>>>> I also see the coefficient of AAc:BBd is NA (the last set of >>>>> commands). I'm wondering why this coefficient is not removed from the >>>>> 'coefficients' vector. Since lm(Y~A) has coefficients for (intercept), >>>>> Ab, Ac (there are no NA's), I think that it is reasonable to make sure >>>>> that there are no NA's when there are interaction terms, namely, A:B >>>>> in this case. >>>>> >>>>> Thank you for answering my questions! >>>>> >>>>>> alm=lm(Y ~ A*B,fr) >>>>>> alm$coefficients >>>>> >>>>> (Intercept) AAb AAc BBb BBc BBd >>>>> -3.548176 -2.086586 7.003857 4.367800 11.887356 -3.470840 >>>>> AAb:BBb AAc:BBb AAb:BBc AAc:BBc AAb:BBd AAc:BBd >>>>> 5.160865 -11.858425 -12.853116 -20.289611 6.727401 -2.327173 >>>>>> >>>>>> alm=lm(Y ~ A + B + A:B,fr) >>>>>> alm$coefficients >>>>> >>>>> (Intercept) AAb AAc BBb BBc BBd >>>>> -3.548176 -2.086586 7.003857 4.367800 11.887356 -3.470840 >>>>> AAb:BBb AAc:BBb AAb:BBc AAc:BBc AAb:BBd AAc:BBd >>>>> 5.160865 -11.858425 -12.853116 -20.289611 6.727401 -2.327173 >>>>>> >>>>>> alm=lm(Y ~ A:B - 1,fr) >>>>>> alm$coefficients >>>>> >>>>> AAa:BBa AAb:BBa AAc:BBa AAa:BBb AAb:BBb AAc:BBb >>>>> AAa:BBc >>>>> -3.5481765 -5.6347625 3.4556808 0.8196231 3.8939016 -4.0349449 >>>>> 8.3391795 >>>>> AAb:BBc AAc:BBc AAa:BBd AAb:BBd AAc:BBd >>>>> -6.6005222 -4.9465744 -7.0190168 -2.3782017 -2.3423322 >>>>>> >>>>>> alm=lm(Y ~ A:B,fr) >>>>>> alm$coefficients >>>>> >>>>> (Intercept) AAa:BBa AAb:BBa AAc:BBa AAa:BBb AAb:BBb >>>>> -2.34233221 -1.20584424 -3.29243033 5.79801305 3.16195534 6.23623377 >>>>> AAc:BBb AAa:BBc AAb:BBc AAc:BBc AAa:BBd AAb:BBd >>>>> -1.69261273 10.68151168 -4.25819000 -2.60424217 -4.67668454 -0.03586951 >>>>> AAc:BBd >>>>> NA >>>>> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.