Dear Caroline, Here is a faster and more elegant solution.
> n <- 10000 > exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), > itemPrice = rpois(n, 10)) > library(plyr) > system.time({ + ddply(exampledata, .(orderID), function(x){ + data.frame(itemPrice = x$itemPrice, orderAmount = cumsum(x$itemPrice)) + }) + }) user system elapsed 1.67 0.00 1.69 > exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] > system.time(for (i in 2:length(exampledata[,1])) + {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) user system elapsed 11.94 0.02 11.97 Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > Namens Caroline Faisst > Verzonden: woensdag 3 augustus 2011 15:26 > Aan: r-help@r-project.org > Onderwerp: [R] slow computation of functions over large datasets > > Hello there, > > > I'm computing the total value of an order from the price of the order items > using > a "for" loop and the "ifelse" function. I do this on a large dataframe (close > to > 1m lines). The computation of this function is painfully slow: in 1min only > about > 90 rows are calculated. > > > The computation time taken for a given number of rows increases with the size > of the dataset, see the example with my function below: > > > # small dataset: function performs well > > exampledata<- > data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) > > exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] > > system.time(for (i in 2:length(exampledata[,1])) > {exampledata[i,"orderAmount"]<- > ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i- > 1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) > > > # large dataset: the very same computational task takes much longer > > exampledata2<- > data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,1 > 0,1,9,7,25:2000020)) > > exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] > > system.time(for (i in 2:9) > {exampledata2[i,"orderAmount"]<- > ifelse(exampledata2[i,"orderID"]==exampledata2[i- > 1,"orderID"],exampledata2[i- > 1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) > > > > Does someone know a way to increase the speed? > > > Thank you very much! > > Caroline > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.