Chris Conner wrote on 09/26/2011 11:33:05 AM:
> 
> Help-Rs
>  
> As someone who is newer to R and trying to make the transition from 
> Access into R, there is a frequetnly used function that I'd like to 
> try and duplicate in the R world.  It involved creating an aggregate
> table of the top (n)  orders for an item by sum of cost over a 
> select period of time.
>  
> So, take the following example :
>  
> group <- c(rep(1,10), rep(2,10), rep(3,10))
> product <- c(rep("itema", 4), rep("itemb", 2), rep("itemc", 1), rep
> ("itemc" , 3), rep("itema", 4), rep("itemb", 2), rep("itemc", 1), 
> rep("itemc", 3),rep("itema", 4), rep("itemb", 2), rep("itemc", 1), 
> rep("itemc", 3))
> cost <- round (rnorm(30, mean = 100, sd = 30), 2)
> DF <- data.frame(group, product, cost)
> agglist <- list(DF$product, DF$group)
> col1<- aggregate(DF [,3], by = agglist, sum)
> col2<- aggregate(aggDF [,3], by = agglist, length)
> (table <- cbind(col1, col2))
>  
> My question would be, how about if you wanted a table that retained 
> only the top 1 product (e.g., item c for group 2) by group... or for
> that matter the top n=2 or n=3 or n=5?  While with this example DF 
> the answer would be easy to find, I'm dealing with millions of orders.
> THX!
> Chris


This may not be the most elegant way, but it works.

group <- rep(1:3, rep(10, 3))
product <- c("itema", "itemb", "itemc")[rep(rep(1:3, c(4, 2, 4)), 3)]
cost <- round(rnorm(30, mean=100, sd=30), 2) 
DF <- data.frame(group, product, cost)

# create a data frame with total cost
totcost <- aggregate(data.frame(cost.sum=DF$cost), by=DF[, c("product", 
"group")], sum)
totcost$n <- aggregate(DF$cost, by=DF[, c("product", "group")], length)$x
totcost$rank <- unlist(tapply(totcost$cost.sum, totcost$group, rank))

# rank the products by total cost within a group
totcost.ordered <- totcost[order(totcost$group, totcost$rank), ]

# to see the top 2 ranked products by cost (the two cheapest)
totcost.ordered[totcost.ordered$rank <= 2, ]


If you want the most expensive (instead of the cheapest), redefine rank 
as:
totcost$rank <- unlist(tapply(-totcost$cost.sum, totcost$group, rank))


Jean


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to