According to the way that you have used 'aggregate', you are taking the column means. Couple of suggestions for faster processing:
1. use matrices instead of data.frames ( i converted your example just before using it) 2, use the 'colMeans' I created a 120 x 100000 matrix with 10 levels and its does the computation in less than 2 seconds: > n <- 100000 > nLevels <- 10 > nRows <- 120 > Cols <- list(rep(list(sample(nRows)), n)) > df <- data.frame(levels = sample(nLevels, nRows, TRUE), Cols) > colnames(df)[-1] <- paste0('col', 1:n) > > # convert to matrix for faster processing > df.m <- as.matrix(df[, -1]) # remove levels column > str(df.m) int [1:120, 1:100000] 111 13 106 61 16 39 25 94 53 38 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:100000] "col1" "col2" "col3" "col4" ... > system.time({ + # split the indices of rows for each level + x <- split(seq(nrow(df)), df$levels) + result <- sapply(x, function(a) colMeans(df.m[a, ])) + }) user system elapsed 1.33 0.00 1.35 > str(result) num [1:100000, 1:10] 57 57 57 57 57 57 57 57 57 57 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:100000] "col1" "col2" "col3" "col4" ... ..$ : chr [1:10] "1" "2" "3" "4" ... > On Tue, Dec 25, 2012 at 11:34 AM, Martin Batholdy <batho...@googlemail.com> wrote: > Hi, > > > I need to aggregate rows of a data.frame by computing the mean for rows with > the same factor-level on one factor-variable; > > here is the sample code: > > > x <- data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) > > aggregate(x, list(x[,1]), mean) > > > Now my problem is, that the actual data-set is much bigger (120 rows and > approximately 100.000 columns) – and it takes very very long (actually at > some point I just stopped it). > > Is there anything that can be done to make the aggregate routine more > efficient? > Or is there a different approach that would work faster? > > > Thanks for any suggestions! > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.