By the way, thanks for sending that formula, it's quite thoughtful of you to send an answer with an actual working line of code!
When I experimented with ddply earlier last week I couldn't figure out the syntax for a single line aggregation, so it's good to have this example. I will likely use it for other things. On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn <iz...@psych.rochester.edu> wrote: > oops. For clarity, that should have been > > sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, > na.rm=TRUE))})$y.sum) > > -Ista > > On Fri, Feb 4, 2011 at 7:52 PM, Ista Zahn <iz...@psych.rochester.edu> > wrote: > > Hi again, > > > > On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes <gleyne...@gmail.com> wrote: > >> Ista, > >> > >> Thank you again. > >> > >> I had figured that out... and was crafting another message when you > replied. > >> > >> The NAs do come though on the variable that is being aggregated, > >> However, they do not come through on the categorical variable(s). > >> > >> The aggregate function must be converting the data frame variables to > >> factors, with the default "omit=NA" parameter. > >> > >> The help on "aggregate" says: > >> na.action A function which indicates what should happen when the > data > >> contain NA values. > >> The default is to ignore missing values in the given > >> variables. > >> By "data" it must only refer to the aggregated variable, and not the > >> categorical variables. I thought it referred to both, because I thought > it > >> referred to the "data" argument, which is the underlying data frame. > >> > >> I think the proper way to accomplish this would be to recast my x > >> (categorical) variables as factors. > > > > Yes, that would work. > > > > This is not feasible for me due to > >> other complications. > >> Also, (imho) the help should be more clear about what the na.action > >> modifies. > >> > >> So, unless someone has a better idea, I guess I'm out of luck? > > > > Well, you can use ddply from the plyr package: > > > > library(plyr) # may need to install first. > > sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, > > na.rm=TRUE))})$y) > > > > However, I don't think you've told us what you're actually trying to > > accomplish... > > > > Best, > > Ista > > > >> > >> > >> On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn <iz...@psych.rochester.edu> > wrote: > >>> > >>> Hi, > >>> > >>> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes <gleyne...@gmail.com> > wrote: > >>> > Thank you both for the thoughtful (and funny) replies. > >>> > > >>> > I agree with both of you that sum is the one picking up aggregate. > >>> > Although > >>> > I didn't mention it, I did realize that in the first place. > >>> > Also, thank you Phil for pointing out that aggregate only accepts a > >>> > formula > >>> > value in more recent versions! I actually thought that was an older > >>> > feature, but I must be thinking of other functions. > >>> > > >>> > I still don't see why these two values are not the same! > >>> > > >>> > It seems like a bug to me > >>> > >>> No, not a bug (see below). > >>> > >>> > > >>> >> set.seed(100) > >>> >> dat=data.frame( > >>> > + x1=sample(c(NA,'m','f'), 100, replace=TRUE), > >>> > + x2=sample(c(NA, 1:10), 100, replace=TRUE), > >>> > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > >>> > + x4=sample(c(NA,T,F), 100, replace=TRUE), > >>> > + y=sample(c(rep(NA,5), rnorm(95)))) > >>> >> sum(dat$y, na.rm=T) > >>> > [1] 0.0815244116598 > >>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, > >>> >> na.rm=T)$y) > >>> > [1] -4.45087666247 > >>> >> > >>> > >>> Because in the first one you are only removing missing data in dat$y. > >>> In the second one you are removeing all rows that contain missing data > >>> in any of the columns. > >>> > >>> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat, > >>> sum, na.action=na.pass, na.rm=T)$y)) > >>> [1] TRUE > >>> > >>> Best, > >>> Ista > >>> > >>> > > >>> > > >>> > > >>> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <iz...@psych.rochester.edu > > > >>> > wrote: > >>> >> > >>> >> Sorry, I didn't see Phil's reply, which is better than mine anyway. > >>> >> > >>> >> -Ista > >>> >> > >>> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn < > iz...@psych.rochester.edu> > >>> >> wrote: > >>> >> > Hi, > >>> >> > > >>> >> > Please see ?na.action > >>> >> > > >>> >> > (just kidding!) > >>> >> > > >>> >> > So it seems to me the problem is that you are passing na.rm to the > >>> >> > sum > >>> >> > function. So there is no missing data for the na.action argument > to > >>> >> > operate on! > >>> >> > > >>> >> > Compare > >>> >> > > >>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) > >>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) > >>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) > >>> >> > > >>> >> > > >>> >> > Best, > >>> >> > Ista > >>> >> > > >>> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleyne...@gmail.com> > >>> >> > wrote: > >>> >> >> Can someone please tell me what is up with na.action in > aggregate? > >>> >> >> > >>> >> >> My (somewhat) reproducible example: > >>> >> >> (I say somewhat because some lines wouldn't run in a separate > >>> >> >> session, > >>> >> >> more > >>> >> >> below) > >>> >> >> > >>> >> >> set.seed(100) > >>> >> >> dat=data.frame( > >>> >> >> x1=sample(c(NA,'m','f'), 100, replace=TRUE), > >>> >> >> x2=sample(c(NA, 1:10), 100, replace=TRUE), > >>> >> >> x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > >>> >> >> x4=sample(c(NA,T,F), 100, replace=TRUE), > >>> >> >> y=sample(c(rep(NA,5), rnorm(95)))) > >>> >> >> dat > >>> >> >> ## The total from dat: > >>> >> >> sum(dat$y, na.rm=T) > >>> >> >> ## The total from aggregate: > >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- > >>> >> >> This > >>> >> >> line > >>> >> >> gave an error in a separate R instance > >>> >> >> ## The aggregate formula is excluding NA > >>> >> >> > >>> >> >> ## So, let's try to include NAs > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >>> >> >> na.action='na.pass')$y) > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >>> >> >> na.action=na.pass)$y) > >>> >> >> ## The aggregate formula is STILL excluding NA > >>> >> >> ## In fact, the formula doesn't seem to notice the na.action > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > na.action='foo > >>> >> >> man > >>> >> >> chew')$y) > >>> >> >> ## Hmmmm... that error surprised me (since the previous two > things > >>> >> >> ran) > >>> >> >> > >>> >> >> ## So, let's try to change the global options > >>> >> >> ## (not mentioned in the help, but after reading the help > >>> >> >> ## 100 times, I thought I would go above and beyond to avoid > >>> >> >> ## any r list flames from people complaining > >>> >> >> ## that I didn't read the help... but that's a separate topic) > >>> >> >> options(na.action ="na.pass") > >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >>> >> >> na.action='na.pass')$y) > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >>> >> >> na.action=na.pass)$y) > >>> >> >> ## (NAs are still omitted) > >>> >> >> > >>> >> >> ## Even more frustrating... > >>> >> >> ## Why don't any of these work??? > >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, > >>> >> >> na.action='na.pass')$x) > >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, > na.action=na.pass)$x) > >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, > >>> >> >> na.action='na.omit')$x) > >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, > na.action=na.omit)$x) > >>> >> >> > >>> >> >> > >>> >> >> ## This does work, but in my real data set, I want NA to really > be > >>> >> >> NA > >>> >> >> for(j in 1:4) > >>> >> >> dat[is.na(dat[,j]),j] = 'NA' > >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > >>> >> >> > >>> >> >> > >>> >> >> ## My first session info > >>> >> >> # > >>> >> >> #> sessionInfo() > >>> >> >> #R version 2.12.0 (2010-10-15) > >>> >> >> #Platform: i386-pc-mingw32/i386 (32-bit) > >>> >> >> # > >>> >> >> #locale: > >>> >> >> # [1] LC_COLLATE=English_United States.1252 > >>> >> >> #[2] LC_CTYPE=English_United States.1252 > >>> >> >> #[3] LC_MONETARY=English_United States.1252 > >>> >> >> #[4] LC_NUMERIC=C > >>> >> >> #[5] LC_TIME=English_United States.1252 > >>> >> >> # > >>> >> >> #attached base packages: > >>> >> >> # [1] stats graphics grDevices utils datasets > >>> >> >> methods > >>> >> >> base > >>> >> >> # > >>> >> >> #other attached packages: > >>> >> >> # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 > >>> >> >> # > >>> >> >> #loaded via a namespace (and not attached): > >>> >> >> # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 > >>> >> >> rJava_0.8-8 > >>> >> >> #[5] tools_2.12.0 > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> I tried running that example in a different version of R, with > and I > >>> >> >> got > >>> >> >> completely different results > >>> >> >> > >>> >> >> The other version of R wouldn't recognize the formula at all.. > >>> >> >> > >>> >> >> My other version of R: > >>> >> >> > >>> >> >> # My second session info > >>> >> >> #> sessionInfo() > >>> >> >> #R version 2.10.1 (2009-12-14) > >>> >> >> #i386-pc-mingw32 > >>> >> >> # > >>> >> >> #locale: > >>> >> >> # [1] LC_COLLATE=English_United States.1252 > >>> >> >> #[2] LC_CTYPE=English_United States.1252 > >>> >> >> #[3] LC_MONETARY=English_United States.1252 > >>> >> >> #[4] LC_NUMERIC=C > >>> >> >> #[5] LC_TIME=English_United States.1252 > >>> >> >> # > >>> >> >> #attached base packages: > >>> >> >> # [1] stats graphics grDevices utils datasets > >>> >> >> methods > >>> >> >> base > >>> >> >> #> > >>> >> >> # > >>> >> >> > >>> >> >> PS: Also, I have read the help on aggregate, factor, as.factor, > and > >>> >> >> several > >>> >> >> other topics. If I missed something, please let me know. > >>> >> >> Some people like to reply to questions by telling the sender that > R > >>> >> >> has > >>> >> >> documentation. Please don't. The R help archives are littered > with > >>> >> >> reminders, friendly and otherwise, of R's documentation. > >>> >> >> > >>> >> >> [[alternative HTML version deleted]] > >>> >> >> > >>> >> >> ______________________________________________ > >>> >> >> R-help@r-project.org mailing list > >>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >>> >> >> PLEASE do read the posting guide > >>> >> >> http://www.R-project.org/posting-guide.html > >>> >> >> and provide commented, minimal, self-contained, reproducible > code. > >>> >> >> > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Ista Zahn > >>> >> > Graduate student > >>> >> > University of Rochester > >>> >> > Department of Clinical and Social Psychology > >>> >> > http://yourpsyche.org > >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Ista Zahn > >>> >> Graduate student > >>> >> University of Rochester > >>> >> Department of Clinical and Social Psychology > >>> >> http://yourpsyche.org > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Ista Zahn > >>> Graduate student > >>> University of Rochester > >>> Department of Clinical and Social Psychology > >>> http://yourpsyche.org > >> > >> > > > > > > > > -- > > Ista Zahn > > Graduate student > > University of Rochester > > Department of Clinical and Social Psychology > > http://yourpsyche.org > > > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.