Actually, I see now that part of the problem is that many of the names have multiple underscores
such as "red_apple_pre" or "post_banana_organic". I think this is causing a
problem for this line in your code:
vmat <- do.call(rbind, strsplit(vars, "_"))
Shantanu
________________________________________
From: Nundy, Shantanu
Sent: Thursday, October 11, 2012 9:07 AM
To: Rui Barradas
Subject: RE: [R] multiple t-tests across similar variable names
Rui,
Thank you so much for your solution. It is exactly what I was struggling with!
One small question. When I ran the code on my actual dataset I got the error
below:
vars <- names(master)
vmat <- do.call(rbind, strsplit(vars, "_"))
Warning message:
In function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 1)
My guess is that the problem is not all the variables have "pre" or "post" in them. Some of the
variables are constants that I will not do a paired t-test on. What would be the easiest way to get around this,
perhaps even by simply removing all of the variables that have neither "pre" or "post" in them?
Thanks again,
Shantanu
________________________________________
From: arun [smartpink...@yahoo.com]
Sent: Thursday, October 11, 2012 8:50 AM
To: Rui Barradas
Cc: Nundy, Shantanu
Subject: Re: [R] multiple t-tests across similar variable names
HI Rui,
Thanks for testing the code. I will look into it later.
A.K.
----- Original Message -----
From: Rui Barradas <ruipbarra...@sapo.pt>
To: arun <smartpink...@yahoo.com>; "Nundy, Shantanu" <snu...@chicagobooth.edu>
Cc: R help <r-help@r-project.org>
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names
Hello,
I have a problem, with your data example my results are different. I have
changed the names of two of the variables, to allow for 'pre' and 'post' to be
first in the names.
# auxiliary functions
ifswap <- function(x)
if(x[1] %in% c("pre", "post")) x[2:1] else x
getpair <- function(i, post)
post[ which(vmat[post, 1] == vmat[i, 1]) ]
makeLine <- function(h)
c(MeanDiff = unname(h$estimate),
CIlower = h$conf.int[1],
CIupper = h$conf.int[2],
p.value = h$p.value)
doTests <- function(DF, Pairs){
t.list <- lapply( seq_len(nrow(Pairs)), function(i)
t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
do.call(rbind, lapply(t.list, makeLine))
}
# dataset
set.seed(432)
dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
orange_post = sample(18:28,5,replace=TRUE),
pre_banana = sample(25:35,5,replace=TRUE), # here
apple_post = sample(20:30,5,replace=TRUE),
post_banana = sample(40:50,5,replace=TRUE), # and here
orange_pre = sample(5:10,5,replace=TRUE))
#--------------------------------
# start processing the data.frame
# Make pairs of pre/post columns
vars <- names(dat2)
vmat <- do.call(rbind, strsplit(vars, "_"))
vmat <- t(apply(vmat, 1, ifswap))
pre <- which(vmat[, 2] == "pre")
post <- which(vmat[, 2] == "post")
post <- sapply(pre, getpair, post)
pairs <- matrix(c(pre, post), ncol = 2)
# now the tests
result <- doTests(dat2, pairs)
rownames(result) <- vmat[pre, 1]
result
In your results I believe that the values for meandifference are the means of
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.
Hope this helps,
Rui Barradas
Em 11-10-2012 05:31, arun escreveu:
HI,
If you have a lot of variables and in no order, then it would be better to
order the data by column names.
For e.g.
set.seed(432)
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
dat3<-dat2[order(colnames(dat2))] #order the columns
list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
res3<-do.call(rbind,lapply(lapply(list3,function(x)
t.test(x[,1],x[,2],paired=TRUE)),function(x)
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
res3
# meandifference CIlow CIhigh p.value
#apple 12.6 8.519476 16.68052 0.0010166626
#banana 15.0 12.088040 17.91196 0.0001388506
#orange 18.2 13.604166 22.79583 0.0003888560
A.K.
----- Original Message -----
From: "Nundy, Shantanu" <snu...@chicagobooth.edu>
To: "r-help@r-project.org" <r-help@r-project.org>
Cc:
Sent: Wednesday, October 10, 2012 7:09 PM
Subject: Re: [R] multiple t-tests across similar variable names
Hi everyone-
I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named
"apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or
"post_banana". The variables are in no particular order.
apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x
How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference,
col 2= 95% conf interval, col 3=p-value.
Thank you kindly,
-Shantanu
Shantanu Nundy, M.D.
University of Chicago
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.