Re: [R] Understanding lm-based analysis of fractional factorial experiments

S Ellison Thu, 07 Mar 2013 06:34:26 -0800

> So, there are at least two points of confusion here, one is 
> how coef() differs from effects() in the case of fractional 
> factorial experiments, and the other is the factor 1/4 
> between the coefficients used by Wu & Hamada and the values 
> returned by effects() as I would think from theory I've read 
> that it should be a factor 2.


Some observations to throw into the mix here.

First, the model.
 
> leaf.lm  <- lm(yavg ~ B * C * D * E * Q, data=leaf)

is indeed unnecessarily complicated. You can request all second order 
interactions (accepting that some will be NA) with 
> leaf.lm  <- lm(yavg ~ ( B + C + D + E + Q)^2, data=leaf)

or of course you can specify only the ones you know will be present:
> leaf.lm  <- lm(yavg ~ (B + C + D + E + Q + B:C + B:D + B:E + B:Q + C:Q + D:Q 
> + E:Q, data=leaf)

I cheated a bit: 
> leaf.avg <- leaf[, c(1:5, 9)]
>  leaf.lm.avg  <- lm(yavg ~ ( .)^2, data=leaf.avg)
        #less typing, same model (with NAs for aliased interactions)

Now lets look at those effects.
First, look at the model.matrix for leaf.lm. 

>model.matrix(leaf.lm)
This shows you what your yavg is actually being regressed against - it's a 
matrix of dummy codings for your (character) levels "+" and "-". 
Notice the  values for B (under "B+") and the intercept. The first row (which 
has "-" for B in the leaf data set) has a zero coefficient; the second has a 1. 
So in this matrix, 1 corresponds to "+" and "-" corresponds to 0, which you can 
read as "not different from the intercept". This arises from teh way contrasts 
have been chosen; these arise from 'treatment contrasts', R's default for 
factors. Pretty much the same goes for all the other factors. This structure 
corresponds to measuring the 'alternate' level of every  factor - always "+" 
because the first level is "-" -  against the intercept. What the intercept 
represents is not immediately obvious if you;re used to the mean of the study 
as the intercept; it estimates the when all factors are in their 'zero' state. 
That's really useful if you want to test a series of alternate treatments 
against  a control. It's also R's default. But it's not what most industrial 
experiments based on 2-level fractional factorials want; the!
 y usually want differences between levels. That is why some folk would call 
R's defaults 'odd defaults'. 

To get something closer to the usual fractional factorial usage, you need to 
change those dummy codings from (0,1) for "-", "+" to something like +1, -1. 
You can change that using contrasts. To change the contrasts to something most 
industrial experimenters would expect, we use sum-to-zero contrasts. So we do 
this instead (using my leaf.avg data frame above):.

> options(contrasts=c("contr.sum", "contr.poly"))
> (leaf.lm.sum <- lm(yavg ~ (.)^2, data=leaf.avg))

Now we can see that the coefficient for B is indeed half the difference between 
levels, as you might have expected. And the intercept is now equal to the mean 
of the data, as you might expect from much of the industrial experiment design 
literature. 
So - why half, and why is it negative?

Look at the new model matrix:
> model.matrix(leaf.lm.sum)

This time, B is full of +1 and -1, instead of 0 and 1. So first, B+ and B- are 
both different (in equal and opposite directions) from the intercept. 
Second, the range allocated to B is (+1 - -1) = 2, so the change in yavg per 
_unit_ change in B  (the lm coefficient for B)- is half the difference between 
levels for B. 
Finally, look at the particular allocation of model matrix values for B. The 
first row has +1, the second row -1. That's because contr.sum has allocated +1 
to the first level of the factor B, which (if you look at levels(leaf$B) is "-" 
because factor() uses a default alphabetic order. (You could change that for 
all your factors if you wanted, for example with leaf$B <- factor(leaf$B, 
levels=c("+","-")) and you'd then have +1 for "+" and -1 for "-") . But in the 
mean time, lm has been told that "-" corresponds to an _increase_ in the dummy 
coding for B. Since the mean for B=="-" is lower than the intercept, you get a 
negative coefficient.

effects() is doing somethig quite different; the help page tells you what its 
doing mathematically. It has an indirect physical interpretation: for simple 
designs like this, effects() output for the coefficients comes out as the 
coefficient multiplied by sqrt(n), where n is the number of observations in the 
experiment. But it is not giving you the effect of a factor in the sense of an 
effect of change in factor level on mean response.

Hope some of that is useful to you.

S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Understanding lm-based analysis of fractional factorial experiments

Reply via email to