Hello,
I am trying to do some fairly straightforward data summarization, i.e., the
kind you would do with a pivot table in excel or by using SQL queires. I
have a moderately sized data set of ~70,000 records and I am trying to
compute some group averages and sum values within groups. the code example
below shows how I am trying to go about doing this
pti <-rnorm(70000,10)
fid <- rnorm(70000,100)
finc <- rnorm(70000,1000)
### compute the sums of pti within fid groups
sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum)
#### compute mean finc within fid groups
tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean)
when I try to do it this way I get an error message telling me that enough
memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of
Memory). I figure that there must be a more efficent way to go about doing
this. Please suggest.
I would typically do this kind of task in a database and use SQL to push the
data around. I know RODBC allows you to write SQL to query external DBs. Is
there any mechanisim that allows you to write SQL queies against datasets
internal to R e.g. in the case above
I could do something like
set <- cbind(fid,pti,finc)
select fid, sum(pti)
from set
group by fid
that would be handy!
Thanks,
Spencer
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.