On 13-Aug-09 20:45:31, Ross Culloch wrote: > > Hi all, > > I have an issue with the lm() function regarding the listing of > the coefficients. My data are below, showing a list of hours (HR) > relating to the time spent resting (R) by an individual animal. > Simply i want to run a lm() to run in an anova() to see if there > is a significant difference in resting between hours. > > HR R > 1 2 0.6666667 > 2 2 0.4666667 > 3 2 0.8000000 > 4 2 0.6333333 > 5 2 0.7333333 > 6 2 0.8000000 > 7 2 0.8666667 > 8 2 0.7857143 > 9 2 0.7826087 > 10 2 0.6666667 > 11 2 0.9166667 > 12 2 0.6666667 > 13 3 0.5294118 > 14 3 0.8541667 > 15 3 0.4583333 > 16 3 0.5882353 > 17 3 0.9347826 > 18 3 0.7878788 > 19 3 0.7857143 > 20 3 0.6944444 > 21 3 0.8333333 > 22 3 0.7450980 > 23 3 0.9230769 > 24 3 0.7222222 > 25 4 0.6571429 > 26 4 0.7241379 > 27 4 0.7391304 > 28 4 0.6571429 > 29 4 0.8000000 > 30 4 0.9130435 > 31 4 0.7187500 > 32 4 0.8437500 > 33 4 0.9230769 > 34 4 0.8571429 > 35 4 0.8695652 > 36 4 0.8888889 > 37 5 0.3333333 > 38 5 0.5365854 > 39 5 0.6774194 > 40 5 0.7142857 > 41 5 0.6904762 > 42 5 0.5483871 > 43 5 0.5952381 > 44 5 0.4166667 > 45 5 0.5666667 > 46 5 0.5952381 > 47 5 0.7894737 > 48 5 0.7500000 > 49 6 0.6268657 > 50 6 0.7187500 > 51 6 0.5500000 > 52 6 0.7164179 > 53 6 0.7656250 > 54 6 0.5869565 > 55 6 0.7164179 > 56 6 0.7031250 > 57 6 0.7230769 > 58 6 0.7462687 > 59 6 0.9200000 > 60 6 0.8536585 > 61 7 0.6379310 > 62 7 0.5357143 > 63 7 0.5227273 > 64 7 0.8000000 > 65 7 0.6724138 > 66 7 0.7083333 > 67 7 0.7241379 > 68 7 0.6938776 > 69 7 0.6545455 > 70 7 0.7931034 > 71 7 0.7560976 > 72 7 0.8684211 > 73 8 0.6727273 > 74 8 0.6000000 > 75 8 0.8333333 > 76 8 0.8181818 > 77 8 0.7818182 > 78 8 0.7647059 > 79 8 0.5818182 > 80 8 0.5918367 > 81 8 0.7450980 > 82 8 0.7818182 > 83 8 0.8048780 > 84 8 0.8684211 > > > The script i'm using and output is as follows: > >> anova(rdayml <- lm(R ~ HR, data=rdata2, na.action=na.exclude)) > Analysis of Variance Table > > Response: R > Df Sum Sq Mean Sq F value Pr(>F) > HR 6 0.25992 0.04332 3.1762 0.00774 ** > Residuals 77 1.05021 0.01364 > --- > Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â > 0.1 â â 1 >> >> summary(rdayml <- lm(R ~ HR,data=rdata2)) > > Call: > lm(formula = R ~ HR, data = rdata2) > > Residuals: > Min > 1Q Median 3Q Max > -0.279725 -0.065416 0.005593 0.077486 0.201070 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.732082 0.033713 21.715 <2e-16 *** > HR3 0.005976 0.047678 0.125 0.9006 > HR4 0.067232 0.047678 1.410 0.1625 > HR5 -0.130935 0.047678 -2.746 0.0075 ** > HR6 -0.013152 0.047678 -0.276 0.7834 > HR7 -0.034807 0.047678 -0.730 0.4676 > HR8 0.004971 0.047678 0.104 0.9172 > ---> Signif. codes: 0 â***â 0.001 â**â .01 â*â 0.05 â.â > 0.1 â â 1 > > Residual standard error: 0.1168 on 77 degrees of freedom > Multiple R-squared: 0.1984, Adjusted R-squared: 0.1359 > F-statistic: 3.176 on 6 and 77 DF, p-value: 0.00774 > > > What i really don't understand is why the lm summary lists the hour > numbers in the coefficient of the lm, as apposed to just reading HR?
What has apparently happened here is that HR has been read in as a factor, nor a numerical covariate (which is probably what you wanted anyway, since you seem to want to compare hours). Hence HR is a factor with 7 levels: "2","3","4","5","6","7","8" (I have wirtten them between "" to show that they will not be interpreted numerically). When you get a coefficient from lm(), e.g. for HR6, this is the coefficient for level "6" of HR, and that is the standard way of writing a factor with its level in the output of ln(). > On top of that if R does display the data like this then i don't > understand why it omits hour 2? This is because your have the Intercept in the model, so along with the 7 levels of HR you now have 8 variables present for a model with 7 coefficients. So one has to go (being redundant). Conventionally this is the first level of the factor (though you can arrange it differently if you want to). Hence HR2 is, in effect, absorbed into the Intercept. In this case, the Intercept is equal to the mean of the values for Hour 2. If you were to add the coefficint for Intercept to the coefficient for any level of HR, e.g. Intercept HR5 (0.732082) + (-0.130935) = 0.601147 you should find that it is the same as the mean of the values in Hour 5: (0.3333333+0.5365854+0.6774194+0.7142857+0.6904762+0.5483871+ 0.5952381+0.4166667+0.5666667+0.5952381+0.7894737+0.7500000)/12 = 0.6011475 as indeed it is! The coefficient of a given level of HR is here the difference between the mean for that hour and the mean for Hour 2. This depends on the system of contrasts ... by default, lm() uses "treatment contrasts" for factors, which express the difference between each "treatment" and the "base level" treatment. Alternativve systems of cintrasts give different representations. > If i can get this to work correctly > can I use the p value to determine which of the hours is significantly > different to the others - so in this example hour 5 is significantly > different? Or is it just a case of using the p value from the anova > to determine that there is a significant difference between hours > (in this case) and use a plot to determine which hour(s) are likely > to be the cause? Don't trust the individual P-values in the first instance, since with several levels somce are bound to have more extreme P-values than others. However, your anova() test is an overall test for whether one or more of the means jointly depart significantly from the Null Hypothesis that they are all equal. Here, your anova() has given a significant result, so the hour means are not all equal. *Now* you can look at the individual P-values to see where this may be coming from. In the list of coefficients, there is only one candidate for "being different from Intercept", and this is Hour 5. Indeed the P-value for HR5 and the P-value for the anova() are nearly equal (the latter being slightly larger, allowing for the fact that it is 1 out of 7). Hoping this helps! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 13-Aug-09 Time: 22:53:22 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.