Chris Conner wrote on 09/26/2011 11:33:05 AM: > > Help-Rs > > As someone who is newer to R and trying to make the transition from > Access into R, there is a frequetnly used function that I'd like to > try and duplicate in the R world. It involved creating an aggregate > table of the top (n) orders for an item by sum of cost over a > select period of time. > > So, take the following example : > > group <- c(rep(1,10), rep(2,10), rep(3,10)) > product <- c(rep("itema", 4), rep("itemb", 2), rep("itemc", 1), rep > ("itemc" , 3), rep("itema", 4), rep("itemb", 2), rep("itemc", 1), > rep("itemc", 3),rep("itema", 4), rep("itemb", 2), rep("itemc", 1), > rep("itemc", 3)) > cost <- round (rnorm(30, mean = 100, sd = 30), 2) > DF <- data.frame(group, product, cost) > agglist <- list(DF$product, DF$group) > col1<- aggregate(DF [,3], by = agglist, sum) > col2<- aggregate(aggDF [,3], by = agglist, length) > (table <- cbind(col1, col2)) > > My question would be, how about if you wanted a table that retained > only the top 1 product (e.g., item c for group 2) by group... or for > that matter the top n=2 or n=3 or n=5? While with this example DF > the answer would be easy to find, I'm dealing with millions of orders. > THX! > Chris
This may not be the most elegant way, but it works. group <- rep(1:3, rep(10, 3)) product <- c("itema", "itemb", "itemc")[rep(rep(1:3, c(4, 2, 4)), 3)] cost <- round(rnorm(30, mean=100, sd=30), 2) DF <- data.frame(group, product, cost) # create a data frame with total cost totcost <- aggregate(data.frame(cost.sum=DF$cost), by=DF[, c("product", "group")], sum) totcost$n <- aggregate(DF$cost, by=DF[, c("product", "group")], length)$x totcost$rank <- unlist(tapply(totcost$cost.sum, totcost$group, rank)) # rank the products by total cost within a group totcost.ordered <- totcost[order(totcost$group, totcost$rank), ] # to see the top 2 ranked products by cost (the two cheapest) totcost.ordered[totcost.ordered$rank <= 2, ] If you want the most expensive (instead of the cheapest), redefine rank as: totcost$rank <- unlist(tapply(-totcost$cost.sum, totcost$group, rank)) Jean [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.