Hello, Perhaps transpose the table attach(as.data.frame(t(data))) and use ColSums() function with order id as header. -Ken Hutchison
On Aug 3, 2554 BE, at 1:12 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Aug 3, 2011, at 12:20 PM, jim holtman wrote: > >> This takes about 2 secs for 1M rows: >> >>> n <- 1000000 >>> exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = >>> TRUE), itemPrice = rpois(n, 10)) >>> require(data.table) >>> # convert to data.table >>> ed.dt <- data.table(exampledata) >>> system.time(result <- ed.dt[ >> + , list(total = sum(itemPrice)) >> + , by = orderID >> + ] >> + ) >> user system elapsed >> 1.30 0.05 1.34 > > Interesting. Impressive. And I noted that the OP wanted what cumsum would > provide and for some reason creating that longer result is even faster on my > machine than the shorter result using sum. > > -- > David. >>> >>> str(result) >> Classes ‘data.table’ and 'data.frame': 198708 obs. of 2 variables: >> $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... >> $ total : num 49 37 72 92 50 76 34 22 65 39 ... >>> head(result) >> orderID total >> [1,] 1 49 >> [2,] 2 37 >> [3,] 3 72 >> [4,] 4 92 >> [5,] 5 50 >> [6,] 6 76 >>> >> >> >> On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst >> <caroline.fai...@gmail.com> wrote: >>> Hello there, >>> >>> >>> I’m computing the total value of an order from the price of the order items >>> using a “for” loop and the “ifelse” function. I do this on a large dataframe >>> (close to 1m lines). The computation of this function is painfully slow: in >>> 1min only about 90 rows are calculated. >>> >>> >>> The computation time taken for a given number of rows increases with the >>> size of the dataset, see the example with my function below: >>> >>> >>> # small dataset: function performs well >>> >>> exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >>> >>> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >>> >>> system.time(for (i in 2:length(exampledata[,1])) >>> {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >>> >>> >>> # large dataset: the very same computational task takes much longer >>> >>> exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) >>> >>> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >>> >>> system.time(for (i in 2:9) >>> {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >>> >>> >>> >>> Does someone know a way to increase the speed? >>> >>> >>> Thank you very much! >>> >>> Caroline >>> >>> [[alternative HTML version deleted]] >>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> >> -- >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.