Sorry about the lack of code, but using Davids example, would: tapply(itemPrice, INDEX=orderID, FUN=sum) work? -Ken Hutchison
On Aug 3, 2554 BE, at 2:09 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Aug 3, 2011, at 2:01 PM, Ken wrote: > >> Hello, >> Perhaps transpose the table attach(as.data.frame(t(data))) and use ColSums() >> function with order id as header. >> -Ken Hutchison > > Got any code? The OP offered a reproducible example, after all. > > -- > David. >> >> On Aug 3, 2554 BE, at 1:12 PM, David Winsemius <dwinsem...@comcast.net> >> wrote: >> >>> >>> On Aug 3, 2011, at 12:20 PM, jim holtman wrote: >>> >>>> This takes about 2 secs for 1M rows: >>>> >>>>> n <- 1000000 >>>>> exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = >>>>> TRUE), itemPrice = rpois(n, 10)) >>>>> require(data.table) >>>>> # convert to data.table >>>>> ed.dt <- data.table(exampledata) >>>>> system.time(result <- ed.dt[ >>>> + , list(total = sum(itemPrice)) >>>> + , by = orderID >>>> + ] >>>> + ) >>>> user system elapsed >>>> 1.30 0.05 1.34 >>> >>> Interesting. Impressive. And I noted that the OP wanted what cumsum would >>> provide and for some reason creating that longer result is even faster on >>> my machine than the shorter result using sum. >>> >>> -- >>> David. >>>>> >>>>> str(result) >>>> Classes ‘data.table’ and 'data.frame': 198708 obs. of 2 variables: >>>> $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... >>>> $ total : num 49 37 72 92 50 76 34 22 65 39 ... >>>>> head(result) >>>> orderID total >>>> [1,] 1 49 >>>> [2,] 2 37 >>>> [3,] 3 72 >>>> [4,] 4 92 >>>> [5,] 5 50 >>>> [6,] 6 76 >>>>> >>>> >>>> >>>> On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst >>>> <caroline.fai...@gmail.com> wrote: >>>>> Hello there, >>>>> >>>>> >>>>> I’m computing the total value of an order from the price of the order >>>>> items >>>>> using a “for” loop and the “ifelse” function. I do this on a large >>>>> dataframe >>>>> (close to 1m lines). The computation of this function is painfully slow: >>>>> in >>>>> 1min only about 90 rows are calculated. >>>>> >>>>> >>>>> The computation time taken for a given number of rows increases with the >>>>> size of the dataset, see the example with my function below: >>>>> >>>>> >>>>> # small dataset: function performs well >>>>> >>>>> exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >>>>> >>>>> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >>>>> >>>>> system.time(for (i in 2:length(exampledata[,1])) >>>>> {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >>>>> >>>>> >>>>> # large dataset: the very same computational task takes much longer >>>>> >>>>> exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) >>>>> >>>>> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >>>>> >>>>> system.time(for (i in 2:9) >>>>> {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >>>>> >>>>> >>>>> >>>>> Does someone know a way to increase the speed? >>>>> >>>>> >>>>> Thank you very much! >>>>> >>>>> Caroline >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Jim Holtman >>>> Data Munger Guru >>>> >>>> What is the problem that you are trying to solve? >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.