HI Shantanu, I saw your reply to Rui regarding multiple underscores in Nabble:
(Actually, I see now that part of the problem is that many of the names have multiple underscores such as "red_apple_pre" or "post_banana_organic". I think this is causing a problem for this line in your code:) I wasn't aware of that problem. In that case, try this: set.seed(432) dat2<-data.frame(red_apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana_organic=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) nam1<-c("apple","orange","banana") nam2<-c("pre","post") colnames(dat2)<-unlist(lapply(lapply(strsplit(colnames(dat2),"_"),function(x) x[x%in%nam1|x%in%nam2]),function(x) paste(x[1],x[2],sep="_"))) colnames(dat2)<-gsub("^pre\\_(.*)","\\1_pre",gsub("^post\\_(.*)","\\1_post",colnames(dat2))) dat3<-t(dat2[order(colnames(dat2))]) dat3<-data.frame(varName=gsub("(.*)\\_.*","\\1",row.names(dat3)),dat3) list3<-lapply(split(dat3,dat3$varName),function(x) t(x[-1])) res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 I hope this works. A.K. ----- Original Message ----- From: "Nundy, Shantanu" <snu...@chicagobooth.edu> To: arun <smartpink...@yahoo.com> Cc: Sent: Thursday, October 11, 2012 10:22 AM Subject: RE: [R] multiple t-tests across similar variable names hi Arun, This is very helpful thanks. I'm running into a couple issues: 1. Since some of the variables start with "pre_apple" and others "apple_post" sorting the variables doesn't completely put pre-post variables next to each other. 2. I have about 50 variables so typing this line is a bit cumbersome: > list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) Thanks, Shantanu ________________________________________ From: arun [smartpink...@yahoo.com] Sent: Thursday, October 11, 2012 9:14 AM To: Rui Barradas Cc: Nundy, Shantanu; R help Subject: Re: [R] multiple t-tests across similar variable names HI Rui, By running your code, I got the results as: result # MeanDiff CIlower CIupper p.value #apple -12.6 -16.68052 -8.519476 0.0010166626 #banana -15.0 -17.91196 -12.088040 0.0001388506 #orange -18.2 -22.79583 -13.604166 0.0003888560 >From my code: res3 # meandifference CIlow CIhigh p.value #apple 12.6 8.519476 16.68052 0.0010166626 #banana 15.0 12.088040 17.91196 0.0001388506 #orange 18.2 13.604166 22.79583 0.0003888560 There is difference in signs. A.K. ----- Original Message ----- From: Rui Barradas <ruipbarra...@sapo.pt> To: arun <smartpink...@yahoo.com>; "Nundy, Shantanu" <snu...@chicagobooth.edu> Cc: R help <r-help@r-project.org> Sent: Thursday, October 11, 2012 9:25 AM Subject: Re: [R] multiple t-tests across similar variable names Hello, I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names. # auxiliary functions ifswap <- function(x) if(x[1] %in% c("pre", "post")) x[2:1] else x getpair <- function(i, post) post[ which(vmat[post, 1] == vmat[i, 1]) ] makeLine <- function(h) c(MeanDiff = unname(h$estimate), CIlower = h$conf.int[1], CIupper = h$conf.int[2], p.value = h$p.value) doTests <- function(DF, Pairs){ t.list <- lapply( seq_len(nrow(Pairs)), function(i) t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) ) do.call(rbind, lapply(t.list, makeLine)) } # dataset set.seed(432) dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE), orange_post = sample(18:28,5,replace=TRUE), pre_banana = sample(25:35,5,replace=TRUE), # here apple_post = sample(20:30,5,replace=TRUE), post_banana = sample(40:50,5,replace=TRUE), # and here orange_pre = sample(5:10,5,replace=TRUE)) #-------------------------------- # start processing the data.frame # Make pairs of pre/post columns vars <- names(dat2) vmat <- do.call(rbind, strsplit(vars, "_")) vmat <- t(apply(vmat, 1, ifswap)) pre <- which(vmat[, 2] == "pre") post <- which(vmat[, 2] == "post") post <- sapply(pre, getpair, post) pairs <- matrix(c(pre, post), ncol = 2) # now the tests result <- doTests(dat2, pairs) rownames(result) <- vmat[pre, 1] result In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got. Anyway, I'll see both codes again, to try to see what's going on. Hope this helps, Rui Barradas Em 11-10-2012 05:31, arun escreveu: > HI, > > If you have a lot of variables and in no order, then it would be better to > order the data by column names. > For e.g. > set.seed(432) > dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE)) > dat3<-dat2[order(colnames(dat2))] #order the columns > list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6]) > res3<-do.call(rbind,lapply(lapply(list3,function(x) > t.test(x[,1],x[,2],paired=TRUE)),function(x) > data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value))) > row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1))) > res3 > # meandifference CIlow CIhigh p.value > #apple 12.6 8.519476 16.68052 0.0010166626 > #banana 15.0 12.088040 17.91196 0.0001388506 > #orange 18.2 13.604166 22.79583 0.0003888560 > > A.K. > > > > ----- Original Message ----- > From: "Nundy, Shantanu" <snu...@chicagobooth.edu> > To: "r-help@r-project.org" <r-help@r-project.org> > Cc: > Sent: Wednesday, October 10, 2012 7:09 PM > Subject: Re: [R] multiple t-tests across similar variable names > > Hi everyone- > > I have a dataset with multiple "pre" and "post" variables I want to compare. > The variables are named "apple_pre" or "pre_banana" with the corresponding > post variables named "apple_post" or "post_banana". The variables are in no > particular order. > > apple_pre orange_pre orange_post pre_banana apple_post post_banana > person_1 > person_2 > person_3 > ... > person_x > > > How do I: > 1. Run a series of paired t-tests for the apple_pre variables and pre_banana > variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*). > 2. Print the results from these t-tests in a table with col 1=mean > difference, col 2= 95% conf interval, col 3=p-value. > > Thank you kindly, > -Shantanu > > Shantanu Nundy, M.D. > University of Chicago > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.