Re: [R] Logistic regression with multiple imputation

Frank E Harrell Jr Wed, 30 Jun 2010 05:45:49 -0700

There are titanic datasets in R binary format athttp://biostat.mc.vanderbilt.edu/DataSets

Note that the aregImpute function in the Hmisc package streamlines manyof the steps, in conjunction with the fit.mult.impute function.


Frank


On 06/30/2010 05:02 AM, Chuck Cleland wrote:

On 6/30/2010 1:14 AM, Daniel Chen wrote:

Hi,

I am a long time SPSS user but new to R, so please bear with me if my
questions seem to be too basic for you guys.

I am trying to figure out how to analyze survey data using logistic
regression with multiple imputation.

I have a survey data of about 200,000 cases and I am trying to predict the
odds ratio of a dependent variable using 6 categorical independent variables
(dummy-coded). Approximatively 10% of the cases (~20,000) have missing data
in one or more of the independent variables. The percentage of missing
ranges from 0.01% to 10% for the independent variables.

My current thinking is to conduct a logistic regression with multiple
imputation, but I don't know how to do it in R. I searched the web but
couldn't find instructions or examples on how to do this. Since SPSS is
hopeless with missing data, I have to learn to do this in R. I am new to R,
so I would really appreciate if someone can show me some examples or tell me
where to find resources.


   Here is an example using the Amelia package to generate imputations
and the mitools and mix packages to make the pooled inferences.

titanic<-
read.table("http://lib.stat.cmu.edu/S/Harrell/data/ascii/titanic.txt";,
sep=',', header=TRUE)

set.seed(4321)

titanic$sex[sample(nrow(titanic), 10)]<- NA
titanic$pclass[sample(nrow(titanic), 10)]<- NA
titanic$survived[sample(nrow(titanic), 10)]<- NA

library(Amelia) # generate multiple imputations
library(mitools) # for MIextract()
library(mix) # for mi.inference()

titanic.amelia<- amelia(subset(titanic,
select=c('survived','pclass','sex','age')),
                          m=10, noms=c('survived','pclass','sex'),
emburn=c(500,500))

allimplogreg<- lapply(titanic.amelia$imputations,
function(x){glm(survived ~ pclass + sex + age, family=binomial, data = x)})

mice.betas.glm<- MIextract(allimplogreg, fun=function(x){coef(x)})
mice.se.glm<- MIextract(allimplogreg, fun=function(x){sqrt(diag(vcov(x)))})

as.data.frame(mi.inference(mice.betas.glm, mice.se.glm))

# Or using only mitools for pooled inference

betas<- MIextract(allimplogreg, fun=coef)
vars<- MIextract(allimplogreg, fun=vcov)
summary(MIcombine(betas,vars))

Thank you!

Daniel

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression with multiple imputation

Reply via email to