Thanks for looking. No, for the quantiles it fails to instantiate the collection of designs correctly, whether hard-coding the variable name or using make.formula. 'with' passes make.formula correctly when calculating the mean, e.g. this works:
MIcombine( with(des, svymean(make.formula(get('var_name'))))) # Here's a reproducible example. DF1 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2), SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20), WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 65085, 21765, 197775, 49931), LBXTCD = c(20.4, 29.7, 8.8, 18.0, 22.2, 10.4, 43.9, 15.3, 13.8, 84.5)) DF2 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2), SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20), WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 65085, 21765, 197775, 49931), LBXTCD = c(21.9, 29.7, 9.2, 5.9, 32.8, 8.9, 43.9, 7.4, 10.5, 84.5)) var_name <- "LBXTCD" # Individually svyquantile (and svymean) work: des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=Df1_red, nest=TRUE) svyquantile(make.formula(get('var_name')), des_single1, c(.5), na.rm = FALSE) des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=Df2_red, nest=TRUE) svyquantile(make.formula(get('var_name')), des_single2, c(.5), na.rm = FALSE) Imputed_list <- c() Imputed_list[[1]] <- DF1 Imputed_list[[2]] <- DF2 # svymean works (so the svydesign object is fine?) but svyquantile doesn't: des_mult <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=imputationList(Imputed_list), nest=TRUE) M_mean <- with(des_mult, svymean(make.formula(get('var_name')))) summary(M_mean) M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')), quantiles = c(.5))) summary(M_quantile) Thanks again, Brennan www.toxstrategies.com From: Anthony Damico <ajdam...@gmail.com> Date: Tuesday, May 10, 2016 at 10:37 PM To: Anne Bichteler <abichte...@toxstrategies.com> Cc: "r-help@r-project.org" <r-help@r-project.org> Subject: Re: [R] Quantiles on multiply imputed survey data - mitools is the `with` not passing make.formula( get( 'var_name' ) ) through to svyquantile for some reason? does this work? MIcombine( with(des, svyquantile(~LBXTCD, .5))) if that's not it, could you make a minimal reproducible example that includes the data download? code to download and import nhanes here https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler <abichte...@toxstrategies.com> wrote: Hello, and thank you for considering this question: The svystat object created with multiply imputed NHANES data files is failing on calling survey::svyquantile. I'm wondering if I'm diagnosing the issue correctly, whether the behavior is expected, and whether y'all might have any ideas for workarounds. I'm following T. Lumley's general method outlined here: http://faculty.washington.edu/tlumley/old-survey/svymi.html <http://faculty.washington.edu/tlumley/old-survey/svymi.html>, but with data files I've imputed myself on the 2001/2002 biennial. Each file has 1081 observations and no missing values. ### Create the survey design object with list of imputed data files ImputedList0102. des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=imputationList(ImputedList0102), nest=TRUE) ### Blood analyte of interest var_name <- "LBXTCD" # analyte in blood serum ### All is well calculating the mean: M <- with(des, svymean(make.formula(get('var_name')))) summary(M) Result <- MIcombine(M) Result$coefficients # LBXTCD # 17.41635 ### but svystat object fails to calculate a 50th percentile: ### it fails when hard-coding the name rather than using make.formula; ### it fails regardless of number of files or choices in handling ties or interval type. ### There are 16 ties in each data file. M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles = c(.5))) summary(M1) # Length Class Mode #[1,] 1 -none- numeric #[2,] 1 -none- numeric #[3,] 1 -none- numeric ### The quantile is successfully calculated on one file at a time, however, and is different for each file. ### (had thought perhaps there was a lack-of-variance issue). The quantile calculated on each file ### is the same regardless of interval.type. des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=ImputedList0102[[1]], nest=TRUE) svyquantile(make.formula(get('var_name')), des_single1, c(.5)) # 0.5 # LBXTCD 13.5554 des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=ImputedList0102[[2]], nest=TRUE) svyquantile(make.formula(get('var_name')), des_single2, c(.5)) # 0.5 # LBXTCD 14.06154 # The number of observations exceeding the 50th percentile differs for each file, which I can't claim to understand. # I removed the 16 ties, but no help. Do the ties and/or different number of observations above/below prevent the svydesigns from being combined? nrow(subset(ImputedList0102[[1]], LBXTCD > 13.5554)) # [1] 516 nrow(subset(ImputedList0102[[2]], LBXTCD > 14.06154)) # [1] 512 I'm hoping someone can point me to some gross error I'm making or another function parameter or data manipulation or another survey-savvy method altogether to calculate a 50th percentile across multiply imputed data files. Thanks for any advice, Brennan www.toxstrategies.com <http://www.toxstrategies.com> ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.