Dear R-group,


I have run into a problem in estimating confidence intervals for the median 
difference.

I want to establish a confidence interval at (1- alpha) level for the 
difference between the 

medians of two indipendent samples (size n and m), by using the Wilcoxon 
distribution 

or with bootstrap methods. 

First method, we consider the z matrix of d=n∙m differences of the first and 
second sample 

data and we order these differences in a y vector. By the Wilcoxon 
distribuition (W) we 

determine the q quantile such that Prob(W<q)= alpha/2, inf and sup of 
confidence interval are 

respectively the q-th and (d-q+1)-th elements of the y vector, while we obtain 
the difference 

median by the y distribution. This method is also used to establish the CI by 
wilcox.test.    

Example. Two indipendent sample A and B (n=11, m=13)  of CD4 count cells 
(T-helper cells):  

A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)

B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381, 415, 626)

 

1)      CI 95% by matrix z and y vector:

n=length(A)

m=length(B)

d=n*m

z=matrix(0,m,n)

for(j in seq_len(n))

z[, j]=A[j] - B

y=sort(as.vector(z))

q=qwilcox(0.05/2,n,m,lower.tail = TRUE, log.p = FALSE) 

inf=y[q]

sup=y[d-q+1]

med=median(y)

results:  inf = 100, sup = 516 and med = 300. 

 

2)      CI 95% by wilcox.test:    

      I=wilcox.test(A,B,conf.lev=0.95,conf.int=TRUE,exact=F,correct=T)

     inf=I$conf.int[1]

     sup=I$conf.int[2]. 

results:  inf = 99.9, sup = 516.

 

Second method, bootstrap each sample separately, creating the sampling 
distribution for 

each median. Then calculate the difference between the two medians, and create 
the 

sampling distribution of those differences. This is the sampling distribution 
we care about. 

Once we have that distribution we can establish a confidence interval. Some CI 
95%, 

with reference to the CD4 example, one given below.

 

1)     First procedure (package boot):

library(boot)

      n=length(A)

      m=length(B)

      y=c(A,B)

     camp=data.frame(group=rep(c(1,2),c(n,m)),y)

     dif.median=function(data,i) {

     d=data[i,]

     n1=n+1

     m1=n+m

     median(d$y[1:n])-median(d$y[n1:m1])  }

     dif.boot=boot(camp,dif.median,R=10000, strata=camp$group)

     boot.ci(dif.boot, conf =0.95, type="bca")

results:  inf = 59, sup = 574.

 

2)      Second procedure (package pairwiseCI):

library(pairwiseCI)

MedDiff=Median.diff(A, B, conf.level=0.95, alternative="two.sided",R=10000)

MedDiff$conf.int

MedDiff$estimate

results:  inf = 56, sup = 574, median=320

 

3)      Third procedure (stratified bootstrap):

      dif <- numeric(10000)

      for(i in seq_len(10000))

      dif[i] <- median(sample(A, replace=TRUE)) - median(sample(B, 
replace=TRUE))

      quantile(dif,prob=c(0.5,(1-0.95)/2,(1-(1-0.95)/2)))

results: inf = 56, sup = 574, median = 313.

 

4)      Fourth procedure (package simpleboot)

 library(simpleboot)

 boot_diff <- two.boot(A, B, median, R = 10000)

 boot.ci(boot_diff,conf=0.95,type="bca")

results: inf = 59, sup = 574.

 

The bootstrap procedures do get the same results, but the confidence intervals 
are 

significantly different from those obtained using the method that refers to the 
Wilcoxon 

distribution.
Problem: does this difference depend on really "different" methods 

or on incorrect implementation of the bootstrap technique?

 

I will greatly appreciate any clarification you could provide.

Best regards.

Vittorio Colagrande

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to