Take a look at
?split (and unsplit)

eg:
Dur <- rnorm(100)
Attr1=rep(c("A","B"),each=50)
Attr2=rep(c("A","B"),times=50)

ap.dat <-data.frame(Attr1,Attr2,Dur)

split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2)
ap.list <-split(ap.dat,split.fact)
ap.mean <-lapply(ap.list,function(x){
        x$meanDur=rep(mean(x$Dur),dim(x)[1])
        return(x)
  })

ap.dat.fast <- unsplit(ap.mean,split.fact)

system.time on 1000 replicates gives :
> system.time(replicate(1000,{
+ split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2)
+ ap.list <-split(ap.dat,split.fact)
+ ap.mean <-lapply(ap.list,functi .... [TRUNCATED]
   user  system elapsed
   4.88    0.00    4.88
> source(.trPaths[5], echo=TRUE, max.deparse.length=150)

> system.time(replicate(1000,{
+ avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]],
+ ap.dat[["Attr2"]]), FUN="mean")
+ meanDur <- sapp .... [TRUNCATED]
   user  system elapsed
  58.00    0.11   58.13
>

It should be a tenfold faster.

Cheers
Joris


On Tue, Jun 1, 2010 at 4:48 PM, Stella Pachidi <stella.pach...@gmail.com>wrote:

> Dear R experts,
>
> I would really appreciate if you had an idea on how to use more
> efficiently the aggregate method:
>
> More specifically, I would like to calculate the mean of certain
> values on a data frame,  grouped by various attributes, and then
> create a new column in the data frame that will have the corresponding
> mean for every row. I attach part of my code:
>
> matchMean <- function(ind,dataTable,aggrTable)
> {
>    index <- which((aggrTable[,1]==dataTable[["Attr1"]][ind]) &
> (aggrTable[,2]==dataTable[["Attr2"]][ind]))
>    as.numeric(aggrTable[index,3])
> }
>
> avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]],
> ap.dat[["Attr2"]]), FUN="mean")
> meanDur <- sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur)
> ap.dat <- cbind (ap.dat, meanDur)
>
> As I deal with very large dataset, it takes long time to run my
> matching function, so if you had an idea on how to automate more this
> matching process I would be really grateful.
>
> Thank you very much in advance!
>
> Kind regards,
> Stella
>
>
>
> --
> Stella Pachidi
> Master in Business Informatics student
> Utrecht University
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to