Hi,

I have a quick statistics-related question. I would like to perform a 
comparison of means across treatments using a linear model framework. 
However, my 'baseline' treatment has many more observations than either of 
the other two treatments and the variance within each treatment is different. 
Furthermore, each observation is assigned a weight-- where weights within a 
treatment sum to 1.

Can I safely use lm() to check differences in means, under these 
circumstances? I have posted an example below.


# generate some data
# note lack of balance
x.1 <- rnorm(n=100, mean=0, sd=1)
x.2 <- rnorm(n=10, mean=4, sd=3)
x.3 <- rnorm(n=10, mean=0, sd=1.5)

# generate some weights that sum to 1 for each treatment
# c/o this post: 
# https://stat.ethz.ch/pipermail/r-help/2008-July/167044.html
#
w.1 <- diff(c(0,sort(sample(seq(1,99,1),99,replace=T)),100)) / 100
w.2 <- diff(c(0,sort(sample(seq(1,99,1),9,replace=T)),100)) / 100
w.3 <- diff(c(0,sort(sample(seq(1,99,1),9,replace=T)),100)) / 100

# generate treatment labels:
trt <- factor(c(rep('baseline', 100), rep(c('data1','data2'), each=10)))

# combine into dataframe
d <- data.frame(values=c(x.1,x.2,x.3), wts=c(w.1,w.2,w.3), treatment=trt)

# compute means and wt.means
d.means <- sapply(by(d, d$treatment, function(i) mean(i$value)), '[')
d.wt.means <- sapply(by(d, d$treatment, function(i) weighted.mean(i$value, 
w=i$wts)), '[')


# quick check:
boxplot(values~ treatment, data=d, varwidth=TRUE, border=grey(0.5), 
main='symbol size is proportional to observation weight')
points(1:3, d.means, col='green', pch=15, cex=1.2)
points(1:3, d.wt.means, col='blue', pch=15, cex=1.2)
points(jitter(as.numeric(d$treatment)), d$values, col='red', 
cex=sqrt(d$wts*100))
legend('topleft', legend=c('original data','mean','wt.mean'), pch=c(1,15,15), 
col=c('red','green','blue'))

# comparison of means (without weights):
# effects are equal to treatment means
summary(lm(values ~ treatment, data=d))

# comparison with means
# effects are equal to treatment weighted-means
summary(lm(values ~ treatment, data=d, weights=wts))

Thanks,
Dylan


-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to