Take a look at ?split (and unsplit) eg: Dur <- rnorm(100) Attr1=rep(c("A","B"),each=50) Attr2=rep(c("A","B"),times=50)
ap.dat <-data.frame(Attr1,Attr2,Dur) split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2) ap.list <-split(ap.dat,split.fact) ap.mean <-lapply(ap.list,function(x){ x$meanDur=rep(mean(x$Dur),dim(x)[1]) return(x) }) ap.dat.fast <- unsplit(ap.mean,split.fact) system.time on 1000 replicates gives : > system.time(replicate(1000,{ + split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2) + ap.list <-split(ap.dat,split.fact) + ap.mean <-lapply(ap.list,functi .... [TRUNCATED] user system elapsed 4.88 0.00 4.88 > source(.trPaths[5], echo=TRUE, max.deparse.length=150) > system.time(replicate(1000,{ + avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]], + ap.dat[["Attr2"]]), FUN="mean") + meanDur <- sapp .... [TRUNCATED] user system elapsed 58.00 0.11 58.13 > It should be a tenfold faster. Cheers Joris On Tue, Jun 1, 2010 at 4:48 PM, Stella Pachidi <stella.pach...@gmail.com>wrote: > Dear R experts, > > I would really appreciate if you had an idea on how to use more > efficiently the aggregate method: > > More specifically, I would like to calculate the mean of certain > values on a data frame, grouped by various attributes, and then > create a new column in the data frame that will have the corresponding > mean for every row. I attach part of my code: > > matchMean <- function(ind,dataTable,aggrTable) > { > index <- which((aggrTable[,1]==dataTable[["Attr1"]][ind]) & > (aggrTable[,2]==dataTable[["Attr2"]][ind])) > as.numeric(aggrTable[index,3]) > } > > avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]], > ap.dat[["Attr2"]]), FUN="mean") > meanDur <- sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur) > ap.dat <- cbind (ap.dat, meanDur) > > As I deal with very large dataset, it takes long time to run my > matching function, so if you had an idea on how to automate more this > matching process I would be really grateful. > > Thank you very much in advance! > > Kind regards, > Stella > > > > -- > Stella Pachidi > Master in Business Informatics student > Utrecht University > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.