There are titanic datasets in R binary format at
http://biostat.mc.vanderbilt.edu/DataSets
Note that the aregImpute function in the Hmisc package streamlines many
of the steps, in conjunction with the fit.mult.impute function.
Frank
On 06/30/2010 05:02 AM, Chuck Cleland wrote:
On 6/30/2010 1:14 AM, Daniel Chen wrote:
Hi,
I am a long time SPSS user but new to R, so please bear with me if my
questions seem to be too basic for you guys.
I am trying to figure out how to analyze survey data using logistic
regression with multiple imputation.
I have a survey data of about 200,000 cases and I am trying to predict the
odds ratio of a dependent variable using 6 categorical independent variables
(dummy-coded). Approximatively 10% of the cases (~20,000) have missing data
in one or more of the independent variables. The percentage of missing
ranges from 0.01% to 10% for the independent variables.
My current thinking is to conduct a logistic regression with multiple
imputation, but I don't know how to do it in R. I searched the web but
couldn't find instructions or examples on how to do this. Since SPSS is
hopeless with missing data, I have to learn to do this in R. I am new to R,
so I would really appreciate if someone can show me some examples or tell me
where to find resources.
Here is an example using the Amelia package to generate imputations
and the mitools and mix packages to make the pooled inferences.
titanic<-
read.table("http://lib.stat.cmu.edu/S/Harrell/data/ascii/titanic.txt",
sep=',', header=TRUE)
set.seed(4321)
titanic$sex[sample(nrow(titanic), 10)]<- NA
titanic$pclass[sample(nrow(titanic), 10)]<- NA
titanic$survived[sample(nrow(titanic), 10)]<- NA
library(Amelia) # generate multiple imputations
library(mitools) # for MIextract()
library(mix) # for mi.inference()
titanic.amelia<- amelia(subset(titanic,
select=c('survived','pclass','sex','age')),
m=10, noms=c('survived','pclass','sex'),
emburn=c(500,500))
allimplogreg<- lapply(titanic.amelia$imputations,
function(x){glm(survived ~ pclass + sex + age, family=binomial, data = x)})
mice.betas.glm<- MIextract(allimplogreg, fun=function(x){coef(x)})
mice.se.glm<- MIextract(allimplogreg, fun=function(x){sqrt(diag(vcov(x)))})
as.data.frame(mi.inference(mice.betas.glm, mice.se.glm))
# Or using only mitools for pooled inference
betas<- MIextract(allimplogreg, fun=coef)
vars<- MIextract(allimplogreg, fun=vcov)
summary(MIcombine(betas,vars))
Thank you!
Daniel
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.