I am sorry. Upon inspection, you only tried to create 70,000 categories. However, the calculations for creating the 140,000 subsetted values pti and finc exhausted your memory or the memory allocated to/in R.
Best, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Ursprüngliche Nachricht----- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Daniel Malter Gesendet: Friday, July 11, 2008 7:53 PM An: 'sj'; 'r-help' Betreff: Re: [R] data summarization etc... The problem is that you do not really have categories. You draw 3 times 70000 random normal variables and then try to subset one by the other. Since, no of the values will perfectly coincide with another, your code would create something like 70000^3 categories. No wonder that you are running out of memory. So what you are doing is nonsensical unless you really have some groups/categories that cluster your data and which are filled with a substantial number of observations (see example below). x1=rnorm(30000,0,1) x2=rnorm(30000,10,5) group1=rep(c(1:3),each=10000) group2=rep(c(1:3),10000) aggregate(cbind(x1,x2),list(group1,group2),FUN=mean) Best, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Ursprüngliche Nachricht----- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von sj Gesendet: Friday, July 11, 2008 6:47 PM An: r-help Betreff: [R] data summarization etc... Hello, I am trying to do some fairly straightforward data summarization, i.e., the kind you would do with a pivot table in excel or by using SQL queires. I have a moderately sized data set of ~70,000 records and I am trying to compute some group averages and sum values within groups. the code example below shows how I am trying to go about doing this pti <-rnorm(70000,10) fid <- rnorm(70000,100) finc <- rnorm(70000,1000) ### compute the sums of pti within fid groups sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum) #### compute mean finc within fid groups tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean) when I try to do it this way I get an error message telling me that enough memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of Memory). I figure that there must be a more efficent way to go about doing this. Please suggest. I would typically do this kind of task in a database and use SQL to push the data around. I know RODBC allows you to write SQL to query external DBs. Is there any mechanisim that allows you to write SQL queies against datasets internal to R e.g. in the case above I could do something like set <- cbind(fid,pti,finc) select fid, sum(pti) from set group by fid that would be handy! Thanks, Spencer [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.