Re: [R] (svy)glm and weights question

Thomas Lumley Tue, 11 May 2010 08:57:14 -0700

On Tue, 11 May 2010, Jos Elkink wrote:

Hi all,


I am running a set of logistic regressions, where we want to use some
weights, and I am not sure whether what I am doing is reasonable or
not.

The dependent variable is turnout in an election - i.e. survey
respondents were asked whether or not they voted. The percentage of
those who say they voted is much higher than the actual turnout,
probably due both to non-response bias and social desirability issues.
So now the suggestion is to weigh the cases, to weigh down the
respondents who say they voted and weigh more heavily those who did
say they did not vote. So the questions that arise from this are:

1) Is it reasonable to use the distribution of the dependent variable
to calculate the weights used in a logistic regression? It feels
wrong, but I cannot find, so far, any sources on this.


Yes and no.  There's nothing special about it being the dependent variable.  As 
with any other methods for handling missing data and measurement error, it 
won't actually work, but it might reduce the bias.

However, there is something special about it being logistic regression model 
with biased sampling only on the dependent variable. This is better known as 
case-control sampling, and there isn't any bias for the coefficients of the 
predictors, so reweighting won't help.

2) How to implement this in R? I tried the weights option in glm(),
but I think that is meant for when you have one row in your data for
multiple observations, not for this kind of weight. Although I have
the McCullagh and Nelder book explaining in detail how glm() operates,
I cannot find a similar book for svyglm(). Is svyglm() better for this
type of weighting?


In general svyglm() is better for this type of weighting.  The point estimates 
are the same (and in fact are obtained from glm()), but the standard errors are 
more appropriate. Under the unreasonable assumption that the weighting does 
correct the bias, the standard errors will also be correct.

3) Where would I find a good source describing the estimation
procedure, including weighting, applied in svyglm()?


Well, one source is the book of the package (see 
http://faculty.washington.edu/tlumley/svybook/ for its web page).  I'm perhaps 
not the best person to say whether it's a good source.  Chapters 5 and 6 on 
regression and 7 on post-stratification, raking and calibration would be 
relevant.

There is much more detail about the general weighting approach in Sarndal, Swensson, Wretman "Model 
Assisted Survey Sampling".  Or you can search for papers on "calibration" and 
"non-response".   The survey literature generally will not say that much about applying these 
methods to regression modelling, but the principles are the same.

    -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (svy)glm and weights question

Reply via email to