Hi,

the standard errors of the coefficients in two regressions that I computed
by hand and using lm() differ by about 1%. Can somebody help me to identify
the source of this difference? The coefficient estimates are the same, but
the standard errors differ. 

####Simulate data
        
        happiness=0
        income=0
        gender=(rep(c(0,1,1,0),25))
                for(i in 1:100){
                        happiness[i]=1000+i+rnorm(1,0,40)
                        income[i]=2*i+rnorm(1,0,40)
                        }

####Run lm()

        reg=lm(happiness~income+factor(gender))
        summary(reg)

####Compute coefficient estimates "by hand"

        x=cbind(income,gender)
        y=happiness

        z=as.matrix(cbind(rep(1,100),x))
        beta=solve(t(z)%*%z)%*%t(z)%*%y

####Compare estimates

        cbind(reg$coef,beta)  ##fine so far, they both look the same
        
        reg$coef[1]-beta[1]
        reg$coef[2]-beta[2]
        reg$coef[3]-beta[3]     ##differences are too small to cause a 1%
difference

####Check predicted values

        estimates=c(beta[2],beta[3])

        predicted=estimates%*%t(x)
        predicted=as.vector(t(as.double(predicted+beta[1])))
        
        cbind(reg$fitted,predicted)             ##inspect fitted values
        as.vector(reg$fitted-predicted) ##differences are marginal

#### Compute errors

        e=NA
        e2=NA
        for(i in 1:length(happiness)){
                e[i]=y[i]-predicted[i]   ##for "hand-computed" regression
                e2[i]=y[i]-reg$fitted[i] ##for lm() regression
        }

#### Compute standard error of the coefficients

  sqrt(abs(var(e)*solve(t(z)%*%z)))     ##for "hand-computed" regression
  sqrt(abs(var(e2)*solve(t(z)%*%z)))    ##for lm() regression using e2 from
above

        ##Both are the same

####Compare to lm() standard errors of the coefficients again

        summary(reg)


The diagonal elements of the variance/covariance matrices should be the
standard errors of the coefficients. Both are identical when computed by
hand. However, they differ from the standard errors reported in
summary(reg). The difference of 1% seems nonmarginal. Should I have
multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do
this, I still observe a difference. Can anybody help me out what the source
of this difference is?

Cheers,
Daniel


-------------------------
cuncta stricte discussurus

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to