thanks, Jim. My system has a RAM of 1 GB. I guess the computed matrix is close to the limit of the memory and that's what caused the problem. I think I'll take Wensui's suggestion and use a relational database system to handle the huge data.
> Date: Sat, 26 Jan 2008 20:40:51 -0500 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: [R] Comparison of aggregate in R and group by in mysql > CC: [EMAIL PROTECTED] > > I think with your data you will be computing a matrix that is 7049 x > 11704. This will require about 700MB of memory. What size system do > you have (how much memory)? How big is the dataframe? (do 'str' and > report what it says). This will require a lot more resources and > given that you have about 80M possible combinations, I would assume > that a lot of them are probably empty. It is having to 'split' the > data into the groups and then summarize. Maybe you should use a > database with this combination of data. > > 2008/1/26 zhihuali <[EMAIL PROTECTED]>: > > > > I repeated your experiment: > > > n <- 1000000 > > > x <- data.frame(A=sample(LETTERS,n,TRUE), > > > B=sample(letters[1:4],n,TRUE),C=sample(LETTERS[1:4], n, TRUE), > > > data=runif(n)) > > > system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean)) > > user system elapsed > > 1.824 0.212 2.038 > > > > > > Now I use my own data: > > > length(levels(group)) > > [1] 7049 > > > length(levels(type)) > > [1] 11704 > > > y<-data.frame(group,type,signal) > > > system.time(y.agg <- aggregate(y$signal, list(y$group,y$type), mean)) > > (I killed it after 30 minutes) > > > > > > > > > Date: Sat, 26 Jan 2008 19:55:51 -0500 > > > From: [EMAIL PROTECTED] > > > To: [EMAIL PROTECTED] > > > Subject: Re: [R] Comparison of aggregate in R and group by in mysql > > > CC: [EMAIL PROTECTED] > > > > > > > > How large is your dataframe? How much memory do you have on your > > > system? Are you paging? Here is a test I ran with a data frame with > > > 1,000,000 entries and it seems to be fast: > > > > > > > n <- 1000000 > > > > x <- data.frame(A=sample(LETTERS,n,TRUE), B=sample(letters[1:4],n,TRUE), > > > + C=sample(LETTERS[1:4], n, TRUE), data=runif(n)) > > > > system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean)) > > > user system elapsed > > > 2.65 0.34 3.00 > > > > > > > > > > On Jan 26, 2008 6:45 PM, zhihuali <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi, netters, > > > > > > > > First of all, thanks a lot for all the prompt replies to my earlier > > > > question about "merging" data frames in R. > > > > Actually that's an equivalence to the "join" clause in mysql. > > > > > > > > Now I have another question. Suppose I have a data frame X with lots of > > > > columns/variables: > > > > Name, Age,Group, Type, Salary. > > > > I wanna do a subtotal of salaries: > > > > aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean) > > > > > > > > When the levels of Group and Type are huge, it took R forever to finish > > > > the aggregation. > > > > And I used gc to find that the memory usage was big too. > > > > > > > > However, in mysql, it took seconds to finish a similar job: > > > > select Group,Age,Type ,avg(Salary) from X group by Group,Age,Type > > > > > > > > Is it because mysql is superior in doing such kind of things? Or my R > > > > command is not efficient enough? Why did R have to consume huge > > > > memories to do the aggregation? > > > > > > > > Thanks again! > > > > > > > > Zhihua Li > > > > > > > > _________________________________________________________________ > > > > ÌìÁ¹ÁË£¬ÌíÒÂÁË£¬Ð͝ÁË£¬"Æß¼þ"ÁË > > > > http://get.live.cn > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > ______________________________________________ > > > > R-help@r-project.org mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > > > > > > > -- > > > Jim Holtman > > > Cincinnati, OH > > > +1 513 646 9390 > > > > > > What is the problem you are trying to solve? > > > > _________________________________________________________________ > > MSNÊ¥µ®ÀñÎï»ðÈȵdz¡£¬Ãâ·Ñ·¢·ÅÖУ¬¿ìÀ´ÁìÈ¡°É£¡ > > http://im.live.cn/emoticons/?ID=18 > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? _________________________________________________________________ ÌìÁ¹ÁË£¬ÌíÒÂÁË£¬Ð͝ÁË£¬¡°Æß¼þ¡±ÁË http://get.live.cn [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.