gheine wrote on 10/11/2011 02:31:46 PM:
> 
> An organization has asked me to comment on the validity of their
> recent all-employee survey.  Survey responses, by geographic region, 
> compared
> with the total number of employees in each region, were as follows:
> 
> > ByRegion
>            All.Employees Survey.Respondents
> Region_1            735                142
> Region_2            500                 83
> Region_3            897                 78
> Region_4            717                133
> Region_5            167                 48
> Region_6            309                  0
> Region_7            806                125
> Region_8            627                122
> Region_9            858                177
> Region_10           851                160
> Region_11           336                 52
> Region_12          1823                312
> Region_13            80                  9
> Region_14           774                121
> Region_15           561                 24
> Region_16           834                134
> 
> How well does the survey represent the employee population?
> Chi-square test says, not very well:
> 
> > chisq.test(ByRegion)
> 
>          Pearson's Chi-squared test
> 
> data:  ByRegion
> X-squared = 163.6869, df = 15, p-value < 2.2e-16
> 
> By striking three under-represented regions (3,6, and 15), we get
> a more reasonable, although still not convincing, result:
> 
> > chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])
> 
>          Pearson's Chi-squared test
> 
> data:  ByRegion[setdiff(1:16, c(3, 6, 15)), ]
> X-squared = 22.5643, df = 12, p-value = 0.03166


You can't simply eliminate the three regions with the fewest respondents 
(3, 6, and 15).  These are the three largest contributors to the 
chi-squared statistic, precisely because fewer people in those regions 
were surveyed than expected.  In addition, more people in regions 1, 5, 
and 9 were surveyed than expected.  This should be clear in a bar chart. 
And the resulting chi-squared test confirms this.

Jean


> This poses several questions:
> 
> 1)  Looking at a side-by-side barchart (proportion of responses vs.
> proportion of employees, per region), the pattern of survey responses
> appears, visually, to match fairly well the pattern of employees.  Is
> this a case where we trust the numbers and not the picture?
> 
> 2) Part of the problem, ironically, is that there were too many 
> responses
> to the survey.  If we had only one-tenth the responses, but in the same
> proportions by region, the chi-square statistic would look much better,
> (though with a warning about possible inaccuracy):
> 
> data:  data.frame(ByRegion$All.Employees, 0.1 * 
> (ByRegion$Survey.Respondents))
> X-squared = 17.5912, df = 15, p-value = 0.2848
> 
> Is there a way of reconciling a large response rate with an 
> unrepresentative
> response profile?  Or is the bad news that the survey will give very 
> precise
> results about a very ill-specified sub-population?
> 
> (Of course, I would put in softer terms, like "you need to assess the 
> degree
> of homogeneity across different regions" .)
> 
> 3) Is Chi-squared really the right measure of how representative is the 
> survey?
> 
> <<<<<<< >>>>>>>>>
> 
> Thanks for any help you can give - hope these questions make sense -
> 
> George H.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to