Try to use formula notation and use na.action=na.pass It is all described in the help(aggregate)
У Няд, 06/02/2011 у 14:54 -0600, Gene Leynes піша: > On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn <iz...@psych.rochester.edu> wrote: > > > > > > > However, I don't think you've told us what you're actually trying to > > > accomplish... > > > > > > > I'm trying to aggregate the y value of a big data set which has several x's > and a y. > I'm using an abstracted example for many reasons. Partially, I'm using an > abstracted example to comply with the posting guidelines of having a > reproducible example. I'm really aggregating some incredibly boring and > complex customer data for an undisclosed client. > > As it turns out, > Aggregate will not work when some of x's are NA, unless you convert them to > factors, with NA's included. > > In my case, the data is so big that doing the conversions causes other > memory problems, and renders some of my numeric values useless for other > calculations. > > My real data looks more like this (except with a few more categories and > records): > > set.seed(100) > library(plyr) > dat=data.frame( > x1=sample(c(NA,'m','f'), 2e6, replace=TRUE), > x2=sample(c(NA, 1:10), 2e6, replace=TRUE), > x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE), > x4=sample(c(NA,T,F), 2e6, replace=TRUE), > x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6, > replace=TRUE), > x6=sample(c(NA, 1:10), 2e6, replace=TRUE), > x7=sample(c(NA,'married','divorced','separated','single','etc'), > 2e6, replace=TRUE), > x8=sample(c(NA,T,F), 2e6, replace=TRUE), > y=trunc(rnorm(2e6)*10000), stringsAsFactors=F) > str(dat) > ## The control total > sum(dat$y, na.rm=T) > ## The aggregate total > sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x) > ## The ddply total > sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x) > {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum) > > ddply worked a little better than I expected at first, but it slows to a > crawl or has runs out of memory too often for me to invest the time learning > how to use it. Just now it worked for 1m records, and it was just a bit > slower than aggregate. But for the 2m example it hasn't finished > calculating. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.