Hi, I am curious about how to interpret the table produced by anova(ols(...)), from the Design package. I have a multiple linear regression model, with some interaction, defined by:
ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity, 3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE, y = TRUE) n Model L.R. d.f. R2 Sigma 1834 1203 14 0.48 1.2 Residuals: Min 1Q Median 3Q Max -5.033 -0.859 0.016 0.739 4.868 Coefficients: Value Std. Error t Pr(>|t|) Intercept 11.3886790 2.0220171 5.63 0.0000000205580 sar -4.3991263 1.0157588 -4.33 0.0000156609226 activity -40.0591221 5.6907822 -7.04 0.0000000000027 activity^2 33.0570116 5.0578520 6.54 0.0000000000819 activity^3 -8.1645147 1.3750370 -5.94 0.0000000034548 conc 0.3841260 0.0813200 4.72 0.0000024942478 sand -0.0096212 0.0327415 -0.29 0.7689032898947 sand^2 0.0008495 0.0008589 0.99 0.3227487169683 sand^3 0.0000025 0.0000066 0.39 0.6994987342042 sar * activity 12.8134698 2.9513942 4.34 0.0000149300007 sar * activity^2 -9.9981381 2.6310765 -3.80 0.0001494462966 sar * activity^3 2.1481278 0.7168339 3.00 0.0027662261037 conc * sand -0.0157426 0.0076013 -2.07 0.0384966958735 conc * sand^2 0.0003419 0.0001989 1.72 0.0857381555491 conc * sand^3 -0.0000027 0.0000015 -1.77 0.0777025949762 Looking at what I 'think' are "marginal p-values" i.e. results of a test against coef_i != 0, there are several terms with non-significant coefficients (at p<0.05). Does a non-significant coefficient warrant removal from the model, or perhaps a mention in the discussion? Compared to the above example, what tests are performed when calling anova() on this object? Here is the output in R: Analysis of Variance Response: log(ksat * 60 * 60) Factor d.f. Partial SS MS F sar (Factor+Higher Order Factors) 4 168.43 42.11 27.0 All Interactions 3 142.13 47.38 30.4 activity (Factor+Higher Order Factors) 6 536.84 89.47 57.3 All Interactions 3 142.13 47.38 30.4 Nonlinear (Factor+Higher Order Factors) 4 257.25 64.31 41.2 conc (Factor+Higher Order Factors) 4 443.02 110.75 71.0 All Interactions 3 76.74 25.58 16.4 sand (Factor+Higher Order Factors) 6 1906.29 317.71 203.6 All Interactions 3 76.74 25.58 16.4 Nonlinear (Factor+Higher Order Factors) 4 263.00 65.75 42.1 sar * activity (Factor+Higher Order Factors) 3 142.13 47.38 30.4 Nonlinear 2 95.32 47.66 30.5 Nonlinear Interaction : f(A,B) vs. AB 2 95.32 47.66 30.5 conc * sand (Factor+Higher Order Factors) 3 76.74 25.58 16.4 Nonlinear 2 4.98 2.49 1.6 Nonlinear Interaction : f(A,B) vs. AB 2 4.98 2.49 1.6 TOTAL NONLINEAR 8 455.20 56.90 36.5 TOTAL INTERACTION 6 218.87 36.48 23.4 TOTAL NONLINEAR + INTERACTION 10 573.36 57.34 36.7 REGRESSION 14 2631.53 187.97 120.4 ERROR 1819 2839.25 1.56 P <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.203 0.203 <.0001 <.0001 <.0001 <.0001 Are more of the 'terms' significant (at p<0.05) due to pooling of model terms? I have looked through Frank's book on the topic, but can't quite wrap my head around what the above is telling me. I am mostly interested in presenting a model for use as a applied tool, and interpretation of terms / interaction is very important. Thanks, Dylan ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.