Re: [R] multiple t-tests across similar variable names

Rui Barradas Thu, 11 Oct 2012 07:54:48 -0700

Hello,

If that is the problem now, then change the variables' names.

In what follows, the first line is just the example you gave. In theactual runnunig code uncomment the commented out lines.


vars <-  c("red_apple_pre", "post_banana_organic")
#vars <- names(dat)
vars <- gsub("_pre", "=pre", vars)
vars <- gsub("_post", "=post", vars)
vars <- gsub("pre_", "pre=", vars)
vars <- gsub("post_", "post=", vars)
vars <- gsub("_", "\\.", vars)
vars <- sub("=", "_", vars)
#names(dat) <- vars

Rui Barradas
Em 11-10-2012 15:17, Nundy, Shantanu escreveu:

Actually, I see now that part of the problem is that many of the names have multiple underscores 
such as "red_apple_pre" or "post_banana_organic". I think this is causing a 
problem for this line in your code:

vmat <- do.call(rbind, strsplit(vars, "_"))

Shantanu

________________________________________
From: Nundy, Shantanu
Sent: Thursday, October 11, 2012 9:07 AM
To: Rui Barradas
Subject: RE: [R] multiple t-tests across similar variable names

Rui,
Thank you so much for your solution. It is exactly what I was struggling with!

One small question. When I ran the code on my actual dataset I got the error 
below:

vars <- names(master)
vmat <- do.call(rbind, strsplit(vars, "_"))

Warning message:
In function (..., deparse.level = 1)  :
   number of columns of result is not a multiple of vector length (arg 1)

My guess is that the problem is not all the variables have "pre" or "post" in them. Some of the 
variables are constants that I will not do a paired t-test on. What would be the easiest way to get around this, 
perhaps even by simply removing all of the variables that have neither "pre" or "post" in them?

Thanks again,
Shantanu







________________________________________
From: arun [smartpink...@yahoo.com]
Sent: Thursday, October 11, 2012 8:50 AM
To: Rui Barradas
Cc: Nundy, Shantanu
Subject: Re: [R] multiple t-tests across similar variable names

HI Rui,

  Thanks for testing the code. I will look into it later.
A.K.




----- Original Message -----
From: Rui Barradas <ruipbarra...@sapo.pt>
To: arun <smartpink...@yahoo.com>; "Nundy, Shantanu" <snu...@chicagobooth.edu>
Cc: R help <r-help@r-project.org>
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have 
changed the names of two of the variables, to allow for 'pre' and 'post' to be 
first in the names.

# auxiliary functions
ifswap <- function(x)
     if(x[1] %in% c("pre", "post")) x[2:1] else x

getpair <- function(i, post)
     post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine <- function(h)
     c(MeanDiff = unname(h$estimate),
         CIlower = h$conf.int[1],
         CIupper = h$conf.int[2],
         p.value = h$p.value)

doTests <- function(DF, Pairs){
     t.list <- lapply( seq_len(nrow(Pairs)), function(i)
         t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
     do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
             orange_post = sample(18:28,5,replace=TRUE),
             pre_banana = sample(25:35,5,replace=TRUE),  # here
             apple_post = sample(20:30,5,replace=TRUE),
             post_banana = sample(40:50,5,replace=TRUE), # and here
             orange_pre = sample(5:10,5,replace=TRUE))


#--------------------------------
# start processing the data.frame
# Make pairs of pre/post columns
vars <- names(dat2)
vmat <- do.call(rbind, strsplit(vars, "_"))
vmat <- t(apply(vmat, 1, ifswap))
pre <- which(vmat[, 2] == "pre")
post <- which(vmat[, 2] == "post")
post <- sapply(pre, getpair, post)
pairs <- matrix(c(pre, post), ncol = 2)

# now the tests
result <- doTests(dat2, pairs)
rownames(result) <- vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of 
x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:

HI,

If you have a lot of variables and in no order, then it would be better to 
order the data by column names.
For e.g.
set.seed(432)
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
dat3<-dat2[order(colnames(dat2))] #order the columns
list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
res3<-do.call(rbind,lapply(lapply(list3,function(x) 
t.test(x[,1],x[,2],paired=TRUE)),function(x) 
data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
res3
#     meandifference     CIlow   CIhigh      p.value
#apple            12.6  8.519476 16.68052 0.0010166626
#banana           15.0 12.088040 17.91196 0.0001388506
#orange           18.2 13.604166 22.79583 0.0003888560

A.K.



----- Original Message -----
From: "Nundy, Shantanu" <snu...@chicagobooth.edu>
To: "r-help@r-project.org" <r-help@r-project.org>
Cc:
Sent: Wednesday, October 10, 2012 7:09 PM
Subject: Re: [R] multiple t-tests across similar variable names

Hi everyone-

I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named 
"apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or 
"post_banana". The variables are in no particular order.

apple_pre orange_pre orange_post pre_banana apple_post post_banana
person_1
person_2
person_3
...
person_x


How do I:
1. Run a series of paired t-tests for the apple_pre variables and pre_banana 
variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
2. Print the results from these t-tests in a table with col 1=mean difference, 
col 2= 95% conf interval, col 3=p-value.

Thank you kindly,
-Shantanu

Shantanu Nundy, M.D.
University of Chicago

       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple t-tests across similar variable names

Reply via email to