Re: [R] significance test interquartile ranges

Rui Barradas Sat, 14 Jul 2012 03:27:49 -0700

Hello,

There's a test for iqr equality, of Westenberg (1948), that can be foundon-line if one really looks. It starts creating a 1 sample pool from thetwo samples and computing the 1st and 3rd quartiles. Then a three columntable where the rows correspond to the samples is built. The middlecolumn is the counts between the quartiles and the side ones to theoutsides. These columns are collapsed into one and a Fisher exact testis conducted on the 2x2 resulting table.


R code could be:


iqr.test <- function(x, y){
        qq <- quantile(c(x, y), prob = c(0.25, 0.75))
        a <- sum(qq[1] < x & x < qq[2])
        b <- length(x) - a
        c <- sum(qq[1] < y & y < qq[2])
        d <- length(y) - b
        m <- matrix(c(a, c, b, d), ncol = 2)
        numer <- sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2))))
        denom <- sum(lfactorial(c(a, b, c, d, sum(m))))
        p.value <- 2*exp(numer - denom)
        data.name <- deparse(substitute(x))
        data.name <- paste(data.name, ", ", deparse(substitute(y)), sep="")
        method <- "Westenberg-Mood test for IQR range equality"
        alternative <- "the IQRs are not equal"
        ht <- list(
                p.value = p.value,
                method = method,
                alternative = alternative,
                data.name = data.name
        )
        class(ht) <- "htest"
        ht
}

n <- 1e3
pv <- numeric(n)
set.seed(2319)
for(i in 1:n){
        x <- rnorm(sample(20:30, 1), 4, 1)
        y <- rchisq(sample(20:40, 1), df=4)
        pv[i] <- iqr.test(x, y)$p.value
}

sum(pv < 0.05)/n  # 0.8


Hope this helps,

Rui Barradas

Em 14-07-2012 09:01, peter dalgaard escreveu:


On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote:

On 13/07/2012 21:37, Greg Snow wrote:

A permutation test may be appropriate:


Yes, it may, but precisely which one is unclear.  You are testing whether the 
two samples have an identical distribution, whereas I took the question to be a 
test of differences in dispersion, with differences in location allowed.

I do not think this can be solved without further assumptions.  E.g people 
often replace the two-sample t-test by the two-sample Wilcoxon test as a test 
of differences in location, not realizing that the latter is also sensitive to 
other aspects of the difference (e.g. both dispersion and shape).


(Brian knows this, of course, but I though it useful to insert a little 
quibbling.)

"Sensitive" is perhaps a little misleading here. The test statistic in the 
Wilcoxon test is essentially an estimate of the probability that a random observation in 
one group is bigger than a random observation in the other group. It isn't hard to 
imagine situation where that quantity is unaffected by a dispersion change so the test is 
not sensitive in the sense that it can detect dispersion changes between sufficiently 
large samples.

However, the point is that p values _rely on_ the null hypothesis that two 
distributions are exactly the same. This is mostly uncontroversial if you are 
testing for an irrelevant grouping, but if you need confidence intervals for 
the difference, you are implicitly assuming a location-shift model.

The same thing is true for permutation tests in general: You need to be rather 
careful about what the assumptions are that allows you to interchange things. 
Asymptotically, the distribution of the IQR depends on the values of the 
density at the true quartiles. These could be different in the two groups, and 
easily completely unrelated to those of a  pooled sample.

I think that I would suggest finding an error estimate for the IQR (or maybe 
log IQR) in each group separately, perhaps by bootstrapping, and then compare 
between groups with an asymptotic z test. The main caveat is whether you have 
sufficiently large sample sizes for asymptotics to hold.

Peter D.


I nearly suggested (yesterday) doing the permutation test on differences from 
medians in the two groups.  But really this is off-topic for R-help and needs 
interaction with a knowledgeable statistician to refine the question.

1. compute the ratio of the 2 IQR values (or other comparison of interest)
2. combine the data from the 2 samples into 1 pool, then randomly
split into 2 groups (matching sample sizes of original) and compute
the ratio of the IQR values for the 2 new samples.
3. repeat #2 a bunch of times (like for a total of 999 random splits)
and combine with the original value.
4. (optional, but strongly suggested) plot a histogram of all the
ratios and place a reference line of the original ratio on the plot.
5. calculate the proportion of ratios that are as extreme or more
extreme than the original, this is the (approximate) p-value.


I think it is an 'exact' (but random) p-value.


On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
<joerg.scha...@med.ovgu.de> wrote:

Hi,

I have two non-normal distributions and use interquartile ranges as a 
dispersion measure.
Now I am looking for a test, which tests whether the interquartile ranges from 
the two distributions are significantly different.
Any idea?

Thanks,

joerg



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] significance test interquartile ranges

Reply via email to