Dear all, I am trying to impute data for a range of variables in my data set, of which unfortunately most variables have missing values, and some have quite a few. So I set up the predictor matrix to exclude certain variables (setting the relevant elements to zero) and then I run the imputation. This works fine if I use predictive mean matching for the continous variables in the data set. When I resort to "norm" instead of pmm, the results look generally fine as well. However, for one variable I get some huge out of range values. Here are summary statistics before and after imputation:
> summary(aux$emitters) #original data Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 0.00219 2.10200 7.33800 17.87000 23.15000 136.20000 52.00000 > summary(complete(imp2)$emitters) #imputation 1 Min. 1st Qu. Median Mean 3rd Qu. Max. -68.920 2.062 10.000 19.980 32.980 136.200 > summary(complete(imp2,2)$emitters) #imputation 2 (looks better) Min. 1st Qu. Median Mean 3rd Qu. Max. -30.650 1.848 8.808 20.480 32.980 136.200 etc. Now my question is, in such cases, would it be better to use pmm for this variable instead, or should I instead use the squeeze() function in MICE? I read a paper explaining MICE: http://www.stefvanbuuren.nl/publications/MICE%20in%20R%20-%20Draft.pdf, but I am still unsure how to proceed. I would be really grateful for some advise, thanks! -- View this message in context: http://r.789695.n4.nabble.com/Multiple-imputation-using-mice-tp4452986p4452986.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.