Dear Emmanuel, Thank you.
Yes I broadly agree with what you say. I think ML is a better strategy than complete case, because I think its estimates will be more robust than complete case. For unbiased estimates I think ML requires the data is MAR, complete case requires the data is MCAR Anyway I would have thought ML could be done without resorting to Multiple Imputation, but I'm at the edge of my knowledge here. Thanks once again, regards Desmond From: Emmanuel Charpentier <charpent <at> bacbuc.dyndns.org> Subject: Re: logistic regression in an incomplete dataset Newsgroups: gmane.comp.lang.r.general Date: 2010-04-05 19:58:20 GMT (2 hours and 10 minutes ago) Dear Desmond, a somewhat analogous question has been posed recently (about 2 weeks ago) on the sig-mixed-model list, and I tried (in two posts) to give some elements of information (and some bibliographic pointers). To summarize tersely : - a model of "information missingness" (i. e. *why* are some data missing ?) is necessary to choose the right measures to take. Two special cases (Missing At Random and Missing Completely At Random) allow for (semi-)automated compensation. See literature for further details. - complete-case analysis may give seriously weakened and *biased* results. Pairwise-complete-case analysis is usually *worse*. - simple imputation leads to underestimated variances and might also give biased results. - multiple imputation is currently thought of a good way to alleviate missing data if you have a missingness model (or can honestly bet on MCAR or MAR), and if you properly combine the results of your imputations. - A few missing data packages exist in R to handle this case. My ersonal selection at this point would be mice, mi, Amelia, and possibly mitools, but none of them is fully satisfying(n particular, accounting for a random effect needs special handling all the way in all packages...). - An interesting alternative is to write a full probability model (in BUGS fo example) and use Bayesian estimation ; in this framework, missing data are "naturally" modeled in the model used for analysis. However, this might entail *large* work, be difficult and not always succeed (numerical difficulties. Furthermore, the results of a Byesian analysis might not be what you seek... HTH, Emmanuel Charpentier Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit : > Dear all, > > I want to do a logistic regression. > So far I've only found out how to do that in R, in a dataset of complete cases. > I'd like to do logistic regression via max likelihood, using all the study cases (complete and incomplete). Can you help? > > I'm using glm() with family=binomial(logit). > If any covariate in a study case is missing then the study case is dropped, i.e. it is doing a complete cases analysis. > As a lot of study cases are being dropped, I'd rather it did maximum likelihood using all the study cases. > I tried setting glm()'s na.action to NULL, but then it complained about NA's present in the study cases. > I've about 1000 unmatched study cases and less than 10 covariates so could use unconditional ML estimation (as opposed to conditional ML estimation). > > regards > Desmond > > > -- > Desmond Campbell > UCL Genetics Institute > d.campb...@ucl.ac.uk > Tel. ext. 020 31084006, int. 54006 > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.