Re: [R] Quantiles on multiply imputed survey data - mitools

Anne Bichteler Wed, 11 May 2016 09:12:02 -0700

Thanks for looking. No, for the quantiles it fails to instantiate the 
collection of designs correctly, whether hard-coding the variable name or using 
make.formula. 'with' passes make.formula correctly when calculating the mean, 
e.g. this works:


MIcombine( with(des, svymean(make.formula(get('var_name')))))

# Here's a reproducible example.

DF1 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2), 
                  SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
                  WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 
65085, 21765, 197775, 49931),
                  LBXTCD = c(20.4, 29.7, 8.8, 18.0, 22.2, 10.4, 43.9, 15.3, 
13.8, 84.5))

DF2 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2), 
                  SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
                  WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 
65085, 21765, 197775, 49931),
                  LBXTCD = c(21.9, 29.7, 9.2, 5.9, 32.8, 8.9, 43.9, 7.4, 10.5, 
84.5))

var_name <- "LBXTCD"

# Individually svyquantile (and svymean) work:
des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=Df1_red, nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single1, c(.5), na.rm = FALSE)

des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=Df2_red, nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single2, c(.5), na.rm = FALSE)

Imputed_list <- c()
Imputed_list[[1]] <- DF1
Imputed_list[[2]] <- DF2

# svymean works (so the svydesign object is fine?) but svyquantile doesn't:
des_mult <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=imputationList(Imputed_list), nest=TRUE)
M_mean <- with(des_mult, svymean(make.formula(get('var_name'))))
summary(M_mean)
M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')), 
quantiles = c(.5)))
summary(M_quantile)


Thanks again,

Brennan

www.toxstrategies.com


From:  Anthony Damico <[email protected]>
Date:  Tuesday, May 10, 2016 at 10:37 PM
To:  Anne Bichteler <[email protected]>
Cc:  "[email protected]" <[email protected]>
Subject:  Re: [R] Quantiles on multiply imputed survey data - mitools


is the `with` not passing make.formula( get( 'var_name' ) ) through to 
svyquantile for some reason?  does this work?

MIcombine( with(des, svyquantile(~LBXTCD, .5)))



if that's not it, could you make a minimal reproducible example that includes 
the data download?  code to download and import nhanes here

https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey





On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler 
<[email protected]> wrote:

Hello, and thank you for considering this question:

The svystat object created with multiply imputed NHANES data files is failing 
on calling survey::svyquantile. I'm wondering if I'm diagnosing the issue 
correctly, whether the behavior is expected, and whether y'all might have any 
ideas for workarounds.

I'm following T. Lumley's general method outlined here: 
http://faculty.washington.edu/tlumley/old-survey/svymi.html 
<http://faculty.washington.edu/tlumley/old-survey/svymi.html>, but with data 
files I've imputed myself on the 2001/2002 biennial. Each file has 1081 
observations and no missing values.

### Create the survey design object with list of imputed data files 
ImputedList0102.
des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=imputationList(ImputedList0102), nest=TRUE)


### Blood analyte of interest
var_name <- "LBXTCD" # analyte in blood serum

### All is well calculating the mean:
M <- with(des, svymean(make.formula(get('var_name'))))
summary(M)
Result <- MIcombine(M)
Result$coefficients
# LBXTCD
# 17.41635


### but svystat object fails to calculate a 50th percentile:
### it fails when hard-coding the name rather than using make.formula;
### it fails regardless of number of files or choices in handling ties or 
interval type.
### There are 16 ties in each data file.
M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles = c(.5)))
summary(M1)

#     Length Class  Mode
#[1,] 1      -none- numeric
#[2,] 1      -none- numeric
#[3,] 1      -none- numeric


### The quantile is successfully calculated on one file at a time, however, and 
is different for each file.
### (had thought perhaps there was a lack-of-variance issue). The quantile 
calculated on each file
### is the same regardless of interval.type.
des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=ImputedList0102[[1]], nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single1, c(.5))
# 0.5
# LBXTCD 13.5554


des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=ImputedList0102[[2]], nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single2, c(.5))
# 0.5
# LBXTCD 14.06154

# The number of observations exceeding the 50th percentile differs for each 
file, which I can't claim to understand.

# I removed the 16 ties, but no help. Do the ties and/or different number of 
observations above/below prevent the svydesigns from being combined?
nrow(subset(ImputedList0102[[1]], LBXTCD > 13.5554))
# [1] 516
nrow(subset(ImputedList0102[[2]], LBXTCD > 14.06154))
# [1] 512


I'm hoping someone can point me to some gross error I'm making or another 
function parameter or data manipulation or another survey-savvy method 
altogether to calculate a 50th percentile across multiply imputed data files. 
Thanks for any advice,

Brennan

www.toxstrategies.com <http://www.toxstrategies.com>
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html 
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.





______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantiles on multiply imputed survey data - mitools

Reply via email to