I think with your data you will be computing a matrix that is 7049 x 11704. This will require about 700MB of memory. What size system do you have (how much memory)? How big is the dataframe? (do 'str' and report what it says). This will require a lot more resources and given that you have about 80M possible combinations, I would assume that a lot of them are probably empty. It is having to 'split' the data into the groups and then summarize. Maybe you should use a database with this combination of data.
2008/1/26 zhihuali <[EMAIL PROTECTED]>: > > I repeated your experiment: > > n <- 1000000 > > x <- data.frame(A=sample(LETTERS,n,TRUE), > > B=sample(letters[1:4],n,TRUE),C=sample(LETTERS[1:4], n, TRUE), > > data=runif(n)) > > system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean)) > user system elapsed > 1.824 0.212 2.038 > > > Now I use my own data: > > length(levels(group)) > [1] 7049 > > length(levels(type)) > [1] 11704 > > y<-data.frame(group,type,signal) > > system.time(y.agg <- aggregate(y$signal, list(y$group,y$type), mean)) > (I killed it after 30 minutes) > > > > > Date: Sat, 26 Jan 2008 19:55:51 -0500 > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > Subject: Re: [R] Comparison of aggregate in R and group by in mysql > > CC: [EMAIL PROTECTED] > > > > > How large is your dataframe? How much memory do you have on your > > system? Are you paging? Here is a test I ran with a data frame with > > 1,000,000 entries and it seems to be fast: > > > > > n <- 1000000 > > > x <- data.frame(A=sample(LETTERS,n,TRUE), B=sample(letters[1:4],n,TRUE), > > + C=sample(LETTERS[1:4], n, TRUE), data=runif(n)) > > > system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean)) > > user system elapsed > > 2.65 0.34 3.00 > > > > > > > On Jan 26, 2008 6:45 PM, zhihuali <[EMAIL PROTECTED]> wrote: > > > > > > Hi, netters, > > > > > > First of all, thanks a lot for all the prompt replies to my earlier > > > question about "merging" data frames in R. > > > Actually that's an equivalence to the "join" clause in mysql. > > > > > > Now I have another question. Suppose I have a data frame X with lots of > > > columns/variables: > > > Name, Age,Group, Type, Salary. > > > I wanna do a subtotal of salaries: > > > aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean) > > > > > > When the levels of Group and Type are huge, it took R forever to finish > > > the aggregation. > > > And I used gc to find that the memory usage was big too. > > > > > > However, in mysql, it took seconds to finish a similar job: > > > select Group,Age,Type ,avg(Salary) from X group by Group,Age,Type > > > > > > Is it because mysql is superior in doing such kind of things? Or my R > > > command is not efficient enough? Why did R have to consume huge memories > > > to do the aggregation? > > > > > > Thanks again! > > > > > > Zhihua Li > > > > > > _________________________________________________________________ > > > 天凉了,添衣了,心动了,"七件"了 > > > http://get.live.cn > > > [[alternative HTML version deleted]] > > > > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem you are trying to solve? > > _________________________________________________________________ > MSN圣诞礼物火热登场,免费发放中,快来领取吧! > http://im.live.cn/emoticons/?ID=18 > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.