gauravbhatti <gaurav15984 <at> hotmail.com> writes: > > > hI
> I have to calculate V statistic for each row of a large dataframe > (28000). I can not use multtest package for paired wilcox test. I have > been using for loop which are. Is there a way to speed the computation > with another method like using apply or tapply? Using a for loop is fine here (and basically unavoidable). If you need it to be faster, use a matrix rather than a data.frame. (i.e. make a matrix containing columns 1-12, which are all numeric and so do not need to be in a data frame). Below are versions using apply, sapply and an explicit for loop. There's not much difference in speed. But the last one, in which the data is in a data.frame with rownames, is much slower. > d <- matrix(rnorm(12000), nrow=1000) > system.time(ans <- apply(d, 1, function(row) unlist(wilcox.test(row[1:6], row[7:12])[c("p.value","statistic")]))) user system elapsed 2.660 0.064 2.730 > system.time(ans2 <- sapply(1:nrow(d), function(i) unlist(wilcox.test(d[i,1:6], d[i,7:12])[c("p.value","statistic")]))) user system elapsed 2.480 0.108 2.583 > system.time({ans3 <- matrix(nrow=nrow(d), ncol=2) ; for(i in 1:nrow(d)) { ans3[i,] <- unlist(wilcox.test(d[i,1:6], d[i,7:12]) [c("p.value","statistic")])}}) user system elapsed 2.504 0.000 2.503 > d <- as.data.frame(d) > rownames(d) <- paste(letters, 1:nrow(d)) > system.time(ans2 <- sapply(1:nrow(d), function(i) unlist(wilcox.test(as.numeric(d[i,1:6]), as.numeric(d[i,7:12]))[c("p.value","statistic")]))) user system elapsed 5.673 0.212 5.885 Dan > My data set looks like this: > 11573_MB 11911_MB 11966_MB 12091_MB 12168_MB > 12420_MB................ > cg00000292 0.62123125 0.82663502 0.74687013 0.61774927 0.7337809 0.73203721 > cg00002426 0.63631315 0.64408750 0.61975158 0.72500713 0.5753110 0.65146526 > cg00003994 0.05035499 0.05189776 0.05882848 0.11198073 0.1313330 0.03883439 > cg00005847 0.13936423 0.14967690 0.31874454 0.15876243 0.1111117 0.15070058 > cg00006414 0.09059770 0.09915681 0.09952658 0.13955982 0.1757718 0.07566312 > cg00007981 0.05622769 0.04143790 0.07167018 0.08051046 0.1378107 0.05439999 > .............. 11573_CB 11911_CB 11966_CB 12091_CB 12168_CB > 12420_CB > cg00000292 0.83059018 0.65396035 0.74519819 0.76007659 0.70335691 0.7857631 > cg00002426 0.61450928 0.59160923 0.69857198 0.73028911 0.71808719 0.6741295 > cg00003994 0.04223668 0.07910444 0.05416764 0.06156407 0.06381321 0.0643354 > cg00005847 0.13897704 0.06407313 0.20449931 0.15683154 0.18936196 0.1610695 > cg00006414 0.06520757 0.12243180 0.11380134 0.10957321 0.15759518 0.1236715 > cg00007981 0.04789030 0.11699024 0.07143036 0.05996888 0.10829510 0.1069037 > . > .. > . > . > . > There are 12 columns and 27000 rows. I have to perform paired test on each > row (1:6 vs 7:12) and store the p value and statistic in two columns . Whats > the fastest way? > Gaurav Bhatti > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.