Other people have explained that the issue is missing data. I just wanted
to note that the reason for using only the complete cases on all variables
is that svymeans() computes the covariance matrix of all the means, and
this can't really be done sensibly when the means are based on different
subsets.
-thomas
On Tue, 26 Aug 2008, Doran, Harold wrote:
I have the following code which produces the output below it
clus1 <- svydesign(ids = ~schid, data = lower_dat)
items <- as.formula(paste(" ~ ", paste(lset, collapse= "+")))
rr1 <- svymean(items, clus1, deff='replace', na.rm=TRUE)
rr1
mean SE DEff
W525209 0.719748 0.015606 2.4932
W525223 0.508228 0.027570 6.2802
W525035 0.827202 0.014060 2.8561
W525131 0.805421 0.015425 3.1350
W525033 0.242982 0.020074 4.5239
W525163 0.904647 0.013905 4.6289
W525165 0.439981 0.020029 3.3620
W525167 0.148112 0.013047 2.7860
W525177 0.865924 0.014977 3.9898
W525179 0.409003 0.020956 3.7515
W525181 0.634076 0.022076 4.3372
W525183 0.242498 0.019073 4.0894
W525401 0.262343 0.021830 3.4354
W525059 0.854792 0.016551 4.5576
W525251 0.691191 0.025010 6.0512
W525083 0.433204 0.017310 2.5200
W525289 0.634560 0.012762 1.4504
W524763 0.791868 0.014478 2.6265
W524765 0.223621 0.019627 4.5818
W524951 0.242982 0.016796 3.1669
W524769 0.820910 0.016786 3.9579
W524771 0.872701 0.015853 4.6712
W524839 0.518877 0.026433 5.7794
W525374 1.209584 0.043065 5.1572
W524885 0.585673 0.027780 6.5674
W525377 1.100678 0.050093 5.8851
W524787 0.839303 0.012994 2.5852
W524789 0.339787 0.019230 3.4041
W524791 0.847047 0.012885 2.6461
W524825 0.500968 0.021988 3.9935
W524795 0.868345 0.014951 4.0377
W524895 0.864472 0.013872 3.3917
W524897 0.804937 0.020070 5.2977
W524967 0.475799 0.032137 8.5511
W525009 0.681994 0.018670 3.3188
However, when I do the following:
svymean(~W524787, clus1, deff='replace', na.rm=TRUE)
mean SE DEff
W524787 0.855547 0.011365 4.1158
Compare this to the value in the row 9 up from the bottom to see it is
different.
Computing the mean of the item by itself with svymeans agrees with the
sample mean
mean(lower_dat$W524787, na.rm=T)
[1] 0.8555471
Now, I know that there is a covariance between the variables, but I was
under the impression that the sample mean was still of pragmatic
utility, but to account for sample design only the standard error is
affected.
In the work I am doing, it is important for the means of the items from
svymeans to be the same as the sample mean when it is computed by
itself. It's a bit of a story as to why, and I can provide that info if
relevant.
I don't see an argument in svydesign or in svymean that would allow for
me to treat the variables as being independent. But, maybe I am missing
something else and would welcome any reactions.
Harold
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Assoc. Professor, Biostatistics
[EMAIL PROTECTED] University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.