[R] Comparing spatial distributions - permutation test implementation

JiHO Wed, 20 May 2009 08:37:31 -0700

Hello everyone,

I am looking at the joint spatial distribution of 2 kinds of organisms(estimated on a grid of points) and want to test for significantassociation or dissociation.

My first question is: do you know a nice technique to do that,considering that I have a limited number of points (36) but that theyare repeated (4 times)? I did GLMs to test for correlations betweenthe two (hence forgetting about the spatial aspect of it) and waspreviously pointed to the SADIE software. Would there be anythingexplicitly spatial and available in R please?

Then, Syrjala's test[1] seems appropriate and tests for differences indistribution. It computes a Cramér-von Mises-type statistic and testsits significance with a permutation test.I implemented the test in R and posted the code on these mailinglists[2]. Some people checked it and confirmed that the statisticgives correct results but my estimation of the p-value does not matchthe one predicted with the orignal software from Syrjala. I don't knowwhat I am doing wrong. The permutation test is described by Syrjala as:


        (...) Under the null hypothesis,
        at a given sampling location (x_k, y_k), either density ob-
        servation y_i(x_k, y_k), i = 1, 2, is equally likely for each
        population. Thus, for a given data set, the distribution
        of the test statistic can be constructed by calculating
        the value of the test statistic for all 2^k pairwise per-
        mutations of the data set. (...) The level of signif-
        icance of a specific realization of the test statistic T is
        determined from its position in the ordered set of test
        statistic values from all 2^k permutations. (...)

My understanding is that, for each permutation I should choose arandom number of points (between 1 and k), swap the values for species1 and species 2 at those points, and recompute the test on the newdata. But this does not work :/ . Here is my code and associated datafrom Syrjala (for which I have reference values). Any advice would bevery welcome (in particular if there is a way to leverage boot() forthis).NB: computing the 1000 permutations can be a bit lengthy, butfortunately, by using plyr, you get a nice progress bar to look at!


syrjala.stat <- function(x, y=NULL, var1=NULL, var2=NULL)
#
#       Compute Syrjala statistic
#       x, y                    coordinates
#       var1, var2      value of 2 parameters both measured at (x,y) points

# NB: x can also be a data.frame/matrix containing x,y,var1,var2 ascolumns

#
{
        # Input checks
        if (!is.null(ncol(x))) {
                if (ncol(x) == 4) {
                        names(x) = c("x","y","var1","var2")
                        dat = x
                } else {
                        stop("Wrong number of columns in argument x")
                }
        } else {
                dat = data.frame(x, y, var1, var2)
        }

        # Normalize abundances
        dat$var1 = dat$var1/sum(dat$var1)
        dat$var2 = dat$var2/sum(dat$var2)

        # For each point (each line of dat)
        # compute the squared difference in gammas from each origin
        meanSqDiff = apply(dat, 1, function(d, coord, variab) {
                north = (coord$x>=d[1])
                east = (coord$y>=d[2])
                south = (coord$x<=d[1])
                west = (coord$y<=d[2])
                return( mean( c(
                        (diff(sapply(variab[(north & east),], sum)))^2,
                        (diff(sapply(variab[(south & east),], sum)))^2,
                        (diff(sapply(variab[(south & west),], sum)))^2,
                        (diff(sapply(variab[(north & west),], sum)))^2
                        ) )
                )
        }, dat[,c("x","y")], dat[,c("var1","var2")])

        # Compute the statistic (i.e. sum of mean squared differences)
        return(sum(meanSqDiff))
}


# Get data online : http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv

system("curl http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv >syrjala_data_cod.csv")


dataCod = read.csv(file = "syrjala_data_cod.csv", header = TRUE)

# Normalize abundances
dataCod$var1 = dataCod$var1/sum(dataCod$var1)
dataCod$var2 = dataCod$var2/sum(dataCod$var2)

# Number of permutations
nperm = 1000

# Create nperm-1 replicates of the data (one is the originalobservation)

d = rep(list(dataCod), nperm-1)

# Compute number of observations before to avoid doing that for everyreplicate

n = nrow(dataCod)

require(plyr)

# Permute some observations and compute the syrjala stat for eachpermutation

psis = ldply(d, .fun=function(x, n){
        # choose indices of observations to swap
        idx = sample(1:n, runif(1, min=1, max=n))
        # swap observations
        x[idx, 3:4] = x[idx, 4:3]
        # compute syrjala stat
        return(syrjala.stat(x))
}, n, .progress="text")
}

# Compute the syrjala stat for the observations
psi = syrjala.stat(dataCod)

# Estimate the pvalue
pvalue = (sum(psis>=psi)+1)/nperm

psi
pvalue
# Should be:
# statistic     = 0.224
# p-value       = 0.1900

Thank you very much in advance. Sincerely,

JiHO
---
http://jo.irisson.free.fr/

[1] A statistical test for a difference between the spatialdistributions of two populations. Syrjala SE. Ecology. 1996;77(1):75–80.

http://dl.getdropbox.com/u/1047321/Syrjala1996.pdf

[2] https://stat.ethz.ch/pipermail/r-sig-geo/2008-February/thread.html#3137


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparing spatial distributions - permutation test implementation

Reply via email to