Re: [R] question about the degrees of freedom

David Winsemius Mon, 03 May 2010 08:07:10 -0700


On May 3, 2010, at 10:38 AM, Ista Zahn wrote:

Hi Serdal,
There is a lot of confusion here (how much is yours and how much is
mine remains to be seen). See specific comments in line.


Also inline comments.


On Mon, May 3, 2010 at 9:19 AM, serdal ozusaglam
<saint-fi...@hotmail.com> wrote:


Dear R users,

I think i have a simple question which i want to explain by anexample;

i have several 2-digit industry codes that i want to use forconducting by-industry analysis but i think there is a problem withthe degrees of freedom!

for example, when i do my analysis without any 2-digit industrycode, i got the following summary (i have 146574 observations intotal):

abc<-lm(lnQ~lnC+lnM+lnL+lnE+eco+inno, data=ds)
summary(abc)


Call:
lm(formula = lnQ ~ lnC + lnM + lnL + lnE + eco + inno, data = ds)

Residuals:
     Min        1Q    Median        3Q       Max
-11.01340  -0.17637  -0.02217   0.14974   7.79005

Coefficients:
            Estimate Std. Error  t value Pr(>|t|)
(Intercept) 0.8870369  0.0050646  175.144   <2e-16 ***
lnC         0.0658922  0.0006549  100.614   <2e-16 ***
lnM         0.8027478  0.0006549 1225.764   <2e-16 ***
lnL         0.0173622  0.0004025   43.138   <2e-16 ***
lnE         0.0657710  0.0006745   97.516   <2e-16 ***
ecoTRUE     0.0101649  0.0045892    2.215   0.0268 *
innoTRUE    0.0945100  0.0030317   31.174   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.294 on 146160 degrees of freedom
 (407 observations deleted due to missingness)
Multiple R-squared: 0.9705,     Adjusted R-squared: 0.9705
F-statistic: 8.027e+05 on 6 and 146160 DF,  p-value: < 2.2e-16

as we can see from the last row there are 146160 DF (407 deleted)this is ok!


Usually it is better to make a small example that demonstrates your
issue. I have no idea what these variable are which makes it harder to
diagnose your problem.

but when i want to use for example just one of the industry letssay just the 11th industry
1st:  i create the dummy for this industry such as;
ind1=(ind_2d==11)# so here the R supposed to consider just the11th industry!!


This makes no sense to me. What are you trying to do here? What is
ind_2d? Are you trying to subset your data.frame? If so, see ?subset,
or ?"["


Serdal is just making a logical indicator variable.

abc<-lm(lnQ~lnC+lnM+lnL+lnE+eco+inno+ind, data=ds)
summary(abc)


Call:
lm(formula = lnQ ~ lnC + lnM + lnL + lnE + eco + inno + ind,
   data = ds)

Residuals:
     Min        1Q    Median        3Q       Max
-11.03392  -0.17647  -0.02301   0.14901   7.74957

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)
(Intercept)  0.8980397  0.0050451  178.001  < 2e-16 ***
lnC          0.0672255  0.0006523  103.065  < 2e-16 ***
lnM          0.7990819  0.0006579 1214.596  < 2e-16 ***
lnL          0.0171633  0.0004004   42.870  < 2e-16 ***
lnE          0.0670030  0.0006716   99.770  < 2e-16 ***
ecoTRUE      0.0162249  0.0045672    3.552 0.000382 ***
innoTRUE     0.0966967  0.0030160   32.062  < 2e-16 ***
indTRUE     -0.1251466  0.0031509  -39.717  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2924 on 146159 degrees of freedom
 (407 observations deleted due to missingness)
Multiple R-squared: 0.9709,     Adjusted R-squared: 0.9709
F-statistic: 6.957e+05 on 7 and 146159 DF,  p-value: < 2.2e-16

but as we can see it again counted in all the industries! so the DFis 146159!!!

So i just wonder, where do i made mistake, or there is no mistakeat all, and i just misunderstood the DF issue?


I think the misunderstanding runs deeper than that. Try creating a
minimal example, and clearly stating a) what you are trying to
accomplish, b) what you tried, and c) what doesn't work as you expect.

I, too, was puzzled by the OP's reaction. Serdal added a singlelogical predictor variable to an existing model that already had twosuch variables and as a result his degrees of freedom in the modelincreased by one and the degrees of freedom in the residuals decreasedby one. Where is the problem? And why wasn't this question posed evenearlier at the point of addition of "eco" and "inno" variables? Heperhaps was expecting that the degrees of freedom in the model wouldincrease by the number of records that shared an indTRUE value ofTRUE, but that is not the way ordinary regression works. Perhaps heshould do some reading on mixed effects modeling? Or perhaps that iswhat his professor or supervisor is hoping he will learn by assigningthis task? Or perhaps he needs to learn to use the anova() function?


Best,
Ista

--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about the degrees of freedom

Reply via email to