If you have a large data frame, one option is package data.table. Try the following:
library(data.table) dt <- data.table(df) dt[, list(qsum = sum(quantity)), by ='client, date, name'] client date name qsum [1,] 1 2010-01-01 one 45 [2,] 1 2011-01-01 one 4500 [3,] 2 2010-01-01 two 40 [4,] 2 2011-01-01 two 4000 [5,] 3 2010-01-01 three 20 [6,] 3 2011-01-01 three 2000 BTW, the leading comma after the opening bracket is not a typo :) For R versions >= 2.11.x, aggregate() has a formula interface that saves a fair bit of typing: aggregate(quantity ~ client + date + name, data = df, FUN = sum) client date name quantity 1 1 2010-01-01 one 45 2 1 2011-01-01 one 4500 3 3 2010-01-01 three 20 4 3 2011-01-01 three 2000 5 2 2010-01-01 two 40 6 2 2011-01-01 two 4000 A third option is package plyr and function ddply(): library(plyr) ddply(df, .(client, date, name), summarise, qsum = sum(quantity)) # same output as data.table Hopefully one or more of these will improve your processing time. Dennis On Wed, Jan 26, 2011 at 2:39 AM, analys...@hotmail.com < analys...@hotmail.com> wrote: > I have > > > df > quantity branch client date name > 1 10 1 1 2010-01-01 one > 2 20 2 1 2010-01-01 one > 3 30 3 2 2010-01-01 two > 4 15 4 1 2010-01-01 one > 5 10 5 2 2010-01-01 two > 6 20 6 3 2010-01-01 three > 7 1000 1 1 2011-01-01 one > 8 2000 2 1 2011-01-01 one > 9 3000 3 2 2011-01-01 two > 10 1500 4 1 2011-01-01 one > 11 1000 5 2 2011-01-01 two > 12 2000 6 3 2011-01-01 three > > I want to aggregate away the branch. I followed a suggestion by Gabor > (thanks) and did > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum) > client date quantity > 1 1 2010-01-01 45 > 2 2 2010-01-01 40 > 3 3 2010-01-01 20 > 4 1 2011-01-01 4500 > 5 2 2011-01-01 4000 > 6 3 2011-01-01 2000 > > I want df$name also in the output and did what looked obvious: > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum) > client date name quantity > 1 1 2010-01-01 one 45 > 2 1 2011-01-01 one 4500 > 3 3 2010-01-01 three 20 > 4 3 2011-01-01 three 2000 > 5 2 2010-01-01 two 40 > 6 2 2011-01-01 two 4000 > > It seems to work, but slows down tremendously for a dataframe with > around a 1000 rows. > > Could anyone explain what is going on and suggest a way out? > > Thanks. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.