Thank you both for the thoughtful (and funny) replies. I agree with both of you that sum is the one picking up aggregate. Although I didn't mention it, I did realize that in the first place. Also, thank you Phil for pointing out that aggregate only accepts a formula value in more recent versions! I actually thought that was an older feature, but I must be thinking of other functions.
I still don't see why these two values are not the same! It seems like a bug to me > set.seed(100) > dat=data.frame( + x1=sample(c(NA,'m','f'), 100, replace=TRUE), + x2=sample(c(NA, 1:10), 100, replace=TRUE), + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), + x4=sample(c(NA,T,F), 100, replace=TRUE), + y=sample(c(rep(NA,5), rnorm(95)))) > sum(dat$y, na.rm=T) *[1] 0.0815244116598* > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y) *[1] -4.45087666247* > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <iz...@psych.rochester.edu> wrote: > Sorry, I didn't see Phil's reply, which is better than mine anyway. > > -Ista > > On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <iz...@psych.rochester.edu> > wrote: > > Hi, > > > > Please see ?na.action > > > > (just kidding!) > > > > So it seems to me the problem is that you are passing na.rm to the sum > > function. So there is no missing data for the na.action argument to > > operate on! > > > > Compare > > > > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) > > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) > > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) > > > > > > Best, > > Ista > > > > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes > > <gleyne...@gmail.com<gleynes%...@gmail.com>> > wrote: > >> Can someone please tell me what is up with na.action in aggregate? > >> > >> My (somewhat) reproducible example: > >> (I say somewhat because some lines wouldn't run in a separate session, > more > >> below) > >> > >> set.seed(100) > >> dat=data.frame( > >> x1=sample(c(NA,'m','f'), 100, replace=TRUE), > >> x2=sample(c(NA, 1:10), 100, replace=TRUE), > >> x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > >> x4=sample(c(NA,T,F), 100, replace=TRUE), > >> y=sample(c(rep(NA,5), rnorm(95)))) > >> dat > >> ## The total from dat: > >> sum(dat$y, na.rm=T) > >> ## The total from aggregate: > >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- This > line > >> gave an error in a separate R instance > >> ## The aggregate formula is excluding NA > >> > >> ## So, let's try to include NAs > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > na.action='na.pass')$y) > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > na.action=na.pass)$y) > >> ## The aggregate formula is STILL excluding NA > >> ## In fact, the formula doesn't seem to notice the na.action > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man > >> chew')$y) > >> ## Hmmmm... that error surprised me (since the previous two things ran) > >> > >> ## So, let's try to change the global options > >> ## (not mentioned in the help, but after reading the help > >> ## 100 times, I thought I would go above and beyond to avoid > >> ## any r list flames from people complaining > >> ## that I didn't read the help... but that's a separate topic) > >> options(na.action ="na.pass") > >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > na.action='na.pass')$y) > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > na.action=na.pass)$y) > >> ## (NAs are still omitted) > >> > >> ## Even more frustrating... > >> ## Why don't any of these work??? > >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) > >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) > >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) > >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) > >> > >> > >> ## This does work, but in my real data set, I want NA to really be NA > >> for(j in 1:4) > >> dat[is.na(dat[,j]),j] = 'NA' > >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > >> > >> > >> ## My first session info > >> # > >> #> sessionInfo() > >> #R version 2.12.0 (2010-10-15) > >> #Platform: i386-pc-mingw32/i386 (32-bit) > >> # > >> #locale: > >> # [1] LC_COLLATE=English_United States.1252 > >> #[2] LC_CTYPE=English_United States.1252 > >> #[3] LC_MONETARY=English_United States.1252 > >> #[4] LC_NUMERIC=C > >> #[5] LC_TIME=English_United States.1252 > >> # > >> #attached base packages: > >> # [1] stats graphics grDevices utils datasets methods > >> base > >> # > >> #other attached packages: > >> # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 > >> # > >> #loaded via a namespace (and not attached): > >> # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 rJava_0.8-8 > >> #[5] tools_2.12.0 > >> > >> > >> > >> I tried running that example in a different version of R, with and I got > >> completely different results > >> > >> The other version of R wouldn't recognize the formula at all.. > >> > >> My other version of R: > >> > >> # My second session info > >> #> sessionInfo() > >> #R version 2.10.1 (2009-12-14) > >> #i386-pc-mingw32 > >> # > >> #locale: > >> # [1] LC_COLLATE=English_United States.1252 > >> #[2] LC_CTYPE=English_United States.1252 > >> #[3] LC_MONETARY=English_United States.1252 > >> #[4] LC_NUMERIC=C > >> #[5] LC_TIME=English_United States.1252 > >> # > >> #attached base packages: > >> # [1] stats graphics grDevices utils datasets methods > >> base > >> #> > >> # > >> > >> PS: Also, I have read the help on aggregate, factor, as.factor, and > several > >> other topics. If I missed something, please let me know. > >> Some people like to reply to questions by telling the sender that R has > >> documentation. Please don't. The R help archives are littered with > >> reminders, friendly and otherwise, of R's documentation. > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > > > -- > > Ista Zahn > > Graduate student > > University of Rochester > > Department of Clinical and Social Psychology > > http://yourpsyche.org > > > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.