On Jul 13, 2009, at 9:31 PM, kfort...@email.unc.edu wrote:
Hello All, Thank you for taking my question. I am looking for
information on how R handles interaction terms in a multiple
regression using the “lm” command. I originally noticed something
was unusual when my R output did not match the output from JMP for
an identical test run previously. Both programs give identical
results for the main test and if the models do not contain the
interaction term then the output is identical. However the results
of the partial F tests differ dramatically when the interaction term
is included.
The interpretation the coefficients and partial F-tests for individual
terms of a model involving interactions is at the very least
difficult, and I have been advised by my statistical betters simply to
not to attempt it. Compare the differences between overall model
statistics instead, and while paying careful attention to the coding
of terms, create predictions for combinations of variables.
Here are the results from R of the test with the interaction:
summary(lm(TD[Year==2007]~Kd[Year==2007]*area[Year==2007],
data=boon_tot))
Call:
lm(formula = TD[Year == 2007] ~ Kd[Year == 2007] * area[Year ==
2007], data = boon_tot)
Residuals:
Min 1Q Median 3Q Max
-0.42696 -0.25648 -0.11960 0.03151 1.27957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.5714 1.7995 3.096
0.0148 *
Kd[Year == 2007] 0.2867 4.0696 0.070
0.9456
area[Year == 2007] 0.8192 0.2874 2.851
0.0215 *
Kd[Year == 2007]:area[Year == 2007] -1.8074 0.6320 -2.860
0.0211 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5238 on 8 degrees of freedom
Multiple R-squared: 0.6826, Adjusted R-squared: 0.5636 F-
statistic: 5.736 on 3 and 8 DF, p-value: 0.02155
Here are the results from JMP for the same model
Source df SS MS F p
Model 3 4.72157318 1.57385773 5.73591141 0.02155127
Error 8 2.19509349 0.27438669
C. Total 11 6.91666667
Source Est. Std Error t value p > t
Intercept 10.4933505 1.24016642 8.46124381
0.00002911
Kd -11.213166 2.95096414 -3.7998315
0.00523792
area (ha) 0.04560254 0.03069489 1.48567197
0.17567049
(Kd-0.428)*
^^^^
(area (ha)-6.3625) -1.8074455 0.63195669 -2.860078
0.02114887
^^^^^
This suggests that JMP has automatically centered the variables prior
to forming the interaction term. What's not so clear is whether the
other terms may have been centered as well.
As you can see although the results of the main test and the
interaction term are identical, the estimate and std error of the
other factors are very different.
The real question would be whether they give identical predictions and
what the difference between model statistics show when the more simple
models are compared with the more complex. You have not yet looked at
this question in detail although the information is available in the
outputs alluded to below..
Additionally if I remove the interaction term from the model, the
two programs then give identical results.
Then JMP must be give acceptable computations, I suppose.
Any thoughts as to why they differ would be appreciated.
Different codings of the variables in the interaction models. Perhaps
you couldcreate a variable that resembles the JMP interaction term and
see if that is confirmed, or you could review the respective manuals
regarding interactions.
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.