Re: [R] EM for missing data

ya Sun, 22 Jul 2012 05:26:11 -0700

hi Greg, David, and Tal,

Thank you very much for the information.

I found this in SPSS 17.0 missing value manual:

EM Method

This method assumes a distribution for the partially missing data and bases 
inferences
on the likelihood under that distribution. Each iteration consists of an E step 
and an
M step. The E step finds the conditional expectation of the âmissingâ data, 
given the
observed values and current estimates of the parameters. These expectations are 
then
substituted for the âmissingâ data. In the M step, maximum likelihood 
estimates of the
parameters are computed as though the missing data had been filled in. 
âMissingâ is
enclosed in quotation marks because the missing values are not being directly 
filled in.
Instead, functions of them are used in the log-likelihood.

From the literatures, I got the idea that multiple imputation is a better 
approach. What I am thinking is, is it possible that R could output a single 
dataset after the whole EM or multiple imputation process, with the estimated 
values "filled in" for the missing values. What I am trying to get is a single 
data set, then I could use it for analysis (like ANOVA) and get the main effect 
of the whole IV instead of the main effects of each category of the IV(if 
multiple imputation was used). Some people suggested that sometimes the main 
effects of each category of IV were averaged to get the effect of the whole IV. 
I have not found the literature talking about this, but my guts told me this 
may not be a good idea since the estimates and standard error, significance... 
, etc can not be just simply averaged. Also, I understand that single 
imputation in this situation may not be appropriate.

Maybe I just asked for too much:)

Best regards,

ya

From: Greg Snow
Date: 2012-07-21 23:35
To: xinxi813
CC: r-help
Subject: Re: [R] EM for missing data
The EM algorithm does not impute missing data, rather it estimates
parameters when you have missing data (those parameters can then be
used to impute the missing values, but that is separate from the EM
algorithm).

If you create a dataset that has missing values imputed (a single
time) and then analyze that dataset as if there were no missing data
then your results will be wrong.  The better approach is multiple
imputation (and there are packages including MICE to do this) where
more than one new dataset is imputed (including error on the imputed
missing values), then each of the imputed datasets is analyzed (don't
look at the results yet, they are still each wrong), then the analyses
are combined to give a correct answer (well as correct as any
statistical procedure is, approximate is probably the better term).
Though this of course is assuming that your assumptions are
reasonable.

If SPSS really gives you a single imputed dataset after running EM for
you to analyze using other tools then my opinion of SPSS will go down.
 The reason that you probably have not found a way to do this in SAS
or R is because they are useful tools that try to not make it easy to
do the wrong thing.

On Sat, Jul 21, 2012 at 5:55 AM, ya <xinxi...@163.com> wrote:
> Hi list,
>
> I am wondering if there is a way to use EM algorithm to handle missing data 
> and get a completed data set in R?
>
> I usually do it in SPSS because EM in SPSS kind of "fill in" the estimated 
> value for the missing data, and then the completed dataset can be saved and 
> used for further analysis. But I have not found a way to get the a completed 
> data set like this in R or SAS. With Amelia or MICE, the missing data set 
> were imputed a couple of times, and the new imputed datasets were not 
> combined. I understand that the parameter estimation can still be done in the 
> way of combination of estimates from each imputed data set, but it would be 
> more convenient to have a combined dataset to do some analysis, for example, 
> ANOVA with IVs having more than two categories. In this case, the only way to 
> get the main effect of the whole IV is to estimate parameters in a single 
> data set(as far as I know). If the separated imputed data sets were used, 
> then the main effect showed in the result were for each category of the IV, 
> respectively. I figured sometimes the readers and reviewers would like to see 
> how bi!
>  g the effect for the whole IV instead of the effect of each category of that 
> IV.
>
> This is one of the reasons I can not fully move to R from SPSS. So any 
> suggestions?
>
> Thank you very much.
>
>
>
>
> ya
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] EM for missing data

Reply via email to