On Friday 05 June 2009, Thomas Lumley wrote: > On Fri, 5 Jun 2009, Dylan Beaudette wrote: > > Is anyone aware of a rank-based, non-paired test such as the > > Krustal-Wallis, that can accommodate weights? > > You don't say what sort of weights, but basically, no. > > Whether you have precision weights or sampling weights, the test will no > longer be distribution-free. > > > Alternatively, would it make sense to simulate a dataset by duplicating > > observations in proportion to their weight, and then using the > > Krustal-Wallis test? > > No. > > -thomas > > Thomas Lumley Assoc. Professor, Biostatistics > tlum...@u.washington.edu University of Washington, Seattle
Thanks, Perhaps I am thinking about the problem incorrectly. Here are some more details. I have a series of values, each associated with a polygon that represents an area on the ground. A sample is made by collecting the polygons (i.e. the values associated with each polygon) and the area fraction of each polygon (i.e. the observation's "weight"). In this past I have used this method to compute a weighted average, or weighted standard deviation. However, I now have a case where a subset of individuals from the region are selected, and re-assigned an area fraction. I would like to evaluate if the subsets are "representative" of the original data-- so far I have been comparing weighted means. Unfortunately, the data are prone to having highly skewed distributions-- possibly making my comparisons of weighted-means invalid. I posted a question about this a while back, but just in case here are some data that approximately resemble the problem at hand: # generate some data # note lack of balance x.1 <- rnorm(n=100, mean=0, sd=1) x.2 <- rnorm(n=10, mean=4, sd=3) x.3 <- rnorm(n=10, mean=0, sd=1.5) # generate some weights that sum to 1 for each treatment # c/o this post: # https://stat.ethz.ch/pipermail/r-help/2008-July/167044.html # w.1 <- diff(c(0,sort(sample(seq(1,99,1),99,replace=T)),100)) / 100 w.2 <- diff(c(0,sort(sample(seq(1,99,1),9,replace=T)),100)) / 100 w.3 <- diff(c(0,sort(sample(seq(1,99,1),9,replace=T)),100)) / 100 # generate treatment labels: trt <- factor(c(rep('baseline', 100), rep(c('data1','data2'), each=10))) # combine into dataframe d <- data.frame(values=c(x.1,x.2,x.3), wts=c(w.1,w.2,w.3), treatment=trt) # compute means and wt.means d.means <- sapply(by(d, d$treatment, function(i) mean(i$value)), '[') d.wt.means <- sapply(by(d, d$treatment, function(i) weighted.mean(i$value, w=i$wts)), '[') # quick check: boxplot(values~ treatment, data=d, varwidth=TRUE, border=grey(0.5), main='symbol size is proportional to observation weight') points(1:3, d.means, col='green', pch=15, cex=1.2) points(1:3, d.wt.means, col='blue', pch=15, cex=1.2) points(jitter(as.numeric(d$treatment)), d$values, col='red', cex=sqrt(d$wts*100)) legend('topleft', legend=c('original data','mean','wt.mean'), pch=c(1,15,15), col=c('red','green','blue')) # another way of looking at the data: library(lattice) densityplot(~ values, groups=treatment, data=d, auto.key=list(columns=3), pch='|') # comparison of means (without weights): # effects are equal to treatment means summary(lm(values ~ treatment, data=d)) # comparison with means # effects are equal to treatment weighted-means summary(lm(values ~ treatment, data=d, weights=wts)) Cheers, Dylan -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.