Hi, You could use library(data.table) x <- data.frame(A=rep(letters,2), B=rnorm(52), C=rnorm(52), D=rnorm(52)) res<- with(x,aggregate(cbind(B,C,D),by=list(A),mean)) colnames(res)[1]<-"A"
x1<-data.table(x) res2<- x1[,list(B=mean(B),C=mean(C),D=mean(D)),by=A] identical(res,data.frame(res2)) #[1] TRUE Just for comparison: set.seed(25) xnew<-data.frame(A=rep(letters,1500),B=rnorm(39000),C=rnorm(39000),D=rnorm(39000)) system.time(resnew<-with(xnew,aggregate(cbind(B,C,D),by=list(A),mean))) #user system elapsed # 0.152 0.000 0.152 xnew1<-data.table(xnew) system.time(resnew1<- xnew1[,list(B=mean(B),C=mean(C),D=mean(D)),by=A]) # user system elapsed # 0.004 0.000 0.005 A.K. ----- Original Message ----- From: Martin Batholdy <batho...@googlemail.com> To: "r-help@r-project.org" <r-help@r-project.org> Cc: Sent: Tuesday, December 25, 2012 11:34 AM Subject: [R] aggregate / collapse big data frame efficiently Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x <- data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.