This takes about 2 secs for 1M rows: > n <- 1000000 > exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), > itemPrice = rpois(n, 10)) > require(data.table) > # convert to data.table > ed.dt <- data.table(exampledata) > system.time(result <- ed.dt[ + , list(total = sum(itemPrice)) + , by = orderID + ] + ) user system elapsed 1.30 0.05 1.34 > > str(result) Classes ‘data.table’ and 'data.frame': 198708 obs. of 2 variables: $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... $ total : num 49 37 72 92 50 76 34 22 65 39 ... > head(result) orderID total [1,] 1 49 [2,] 2 37 [3,] 3 72 [4,] 4 92 [5,] 5 50 [6,] 6 76 >
On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst <caroline.fai...@gmail.com> wrote: > Hello there, > > > I’m computing the total value of an order from the price of the order items > using a “for” loop and the “ifelse” function. I do this on a large dataframe > (close to 1m lines). The computation of this function is painfully slow: in > 1min only about 90 rows are calculated. > > > The computation time taken for a given number of rows increases with the > size of the dataset, see the example with my function below: > > > # small dataset: function performs well > > exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) > > exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] > > system.time(for (i in 2:length(exampledata[,1])) > {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) > > > # large dataset: the very same computational task takes much longer > > exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) > > exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] > > system.time(for (i in 2:9) > {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) > > > > Does someone know a way to increase the speed? > > > Thank you very much! > > Caroline > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.