On Wed, 6 May 2009, jjh21 wrote:


Hello,

I have been using the Hmisc package's deff() command for some research with
clustered data. I noticed that the formula to calculate the design effect
seems a bit different. The formula for the DE is:

1 + rho*(B - 1)

In most resources I have seen the formula for B to simply be the average
number of observations in a cluster: n/k if n is the total sample size and k
is the number of clusters.

However, the deff() command calculates B as: sum(number of observations in
each cluster^2)/n.

That is a bit hard to write without the Sigma operator. In English it is
"squaring the number of observations in each cluster, adding all those up,
and dividing that total by n."

Which formula is correct? Thank you!

The formula in Hmisc is correct (if the correlation doesn't vary with the cluster size). If you think of the formula for the variance of a sum, it involves adding up all the variances and covariances. A cluster of size k has k^2-k covariances between members, so the total number of covariances is sum(k^2-k) over all the clusters, plus the sum(k) variances.

Another way to think of it is that the larger clusters get too much weight, so in addition to the rho*(B-1) factor that you would have for equal-sized clusters there is an additional loss of efficiency due to giving too much weight to the larger clusters.

        -thomas

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to