Ista, Thank you again.
I had figured that out... and was crafting another message when you replied. The NAs do come though on the variable that is being aggregated, However, they do not come through on the categorical variable(s). The aggregate function must be converting the data frame variables to factors, with the default "omit=NA" parameter. The help on "aggregate" says: na.action A function which indicates what should happen when the data contain NA values. The default is to ignore missing values in the given variables. By "data" it must only refer to the aggregated variable, and not the categorical variables. I thought it referred to both, because I thought it referred to the "data" argument, which is the underlying data frame. I think the proper way to accomplish this would be to recast my x (categorical) variables as factors. This is not feasible for me due to other complications. Also, (imho) the help should be more clear about what the na.action modifies. So, unless someone has a better idea, I guess I'm out of luck? On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn <iz...@psych.rochester.edu> wrote: > Hi, > > On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes > <gleyne...@gmail.com<gleynes%...@gmail.com>> > wrote: > > Thank you both for the thoughtful (and funny) replies. > > > > I agree with both of you that sum is the one picking up aggregate. > Although > > I didn't mention it, I did realize that in the first place. > > Also, thank you Phil for pointing out that aggregate only accepts a > formula > > value in more recent versions! I actually thought that was an older > > feature, but I must be thinking of other functions. > > > > I still don't see why these two values are not the same! > > > > It seems like a bug to me > > No, not a bug (see below). > > > > >> set.seed(100) > >> dat=data.frame( > > + x1=sample(c(NA,'m','f'), 100, replace=TRUE), > > + x2=sample(c(NA, 1:10), 100, replace=TRUE), > > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > > + x4=sample(c(NA,T,F), 100, replace=TRUE), > > + y=sample(c(rep(NA,5), rnorm(95)))) > >> sum(dat$y, na.rm=T) > > [1] 0.0815244116598 > >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, > na.rm=T)$y) > > [1] -4.45087666247 > >> > > Because in the first one you are only removing missing data in dat$y. > In the second one you are removeing all rows that contain missing data > in any of the columns. > > all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat, > sum, na.action=na.pass, na.rm=T)$y)) > [1] TRUE > > Best, > Ista > > > > > > > > > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <iz...@psych.rochester.edu> > wrote: > >> > >> Sorry, I didn't see Phil's reply, which is better than mine anyway. > >> > >> -Ista > >> > >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <iz...@psych.rochester.edu> > >> wrote: > >> > Hi, > >> > > >> > Please see ?na.action > >> > > >> > (just kidding!) > >> > > >> > So it seems to me the problem is that you are passing na.rm to the sum > >> > function. So there is no missing data for the na.action argument to > >> > operate on! > >> > > >> > Compare > >> > > >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) > >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) > >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) > >> > > >> > > >> > Best, > >> > Ista > >> > > >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes > >> > <gleyne...@gmail.com<gleynes%...@gmail.com>> > wrote: > >> >> Can someone please tell me what is up with na.action in aggregate? > >> >> > >> >> My (somewhat) reproducible example: > >> >> (I say somewhat because some lines wouldn't run in a separate > session, > >> >> more > >> >> below) > >> >> > >> >> set.seed(100) > >> >> dat=data.frame( > >> >> x1=sample(c(NA,'m','f'), 100, replace=TRUE), > >> >> x2=sample(c(NA, 1:10), 100, replace=TRUE), > >> >> x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > >> >> x4=sample(c(NA,T,F), 100, replace=TRUE), > >> >> y=sample(c(rep(NA,5), rnorm(95)))) > >> >> dat > >> >> ## The total from dat: > >> >> sum(dat$y, na.rm=T) > >> >> ## The total from aggregate: > >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- This > >> >> line > >> >> gave an error in a separate R instance > >> >> ## The aggregate formula is excluding NA > >> >> > >> >> ## So, let's try to include NAs > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >> >> na.action='na.pass')$y) > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >> >> na.action=na.pass)$y) > >> >> ## The aggregate formula is STILL excluding NA > >> >> ## In fact, the formula doesn't seem to notice the na.action > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo > man > >> >> chew')$y) > >> >> ## Hmmmm... that error surprised me (since the previous two things > ran) > >> >> > >> >> ## So, let's try to change the global options > >> >> ## (not mentioned in the help, but after reading the help > >> >> ## 100 times, I thought I would go above and beyond to avoid > >> >> ## any r list flames from people complaining > >> >> ## that I didn't read the help... but that's a separate topic) > >> >> options(na.action ="na.pass") > >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >> >> na.action='na.pass')$y) > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, > >> >> na.action=na.pass)$y) > >> >> ## (NAs are still omitted) > >> >> > >> >> ## Even more frustrating... > >> >> ## Why don't any of these work??? > >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) > >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) > >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) > >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) > >> >> > >> >> > >> >> ## This does work, but in my real data set, I want NA to really be NA > >> >> for(j in 1:4) > >> >> dat[is.na(dat[,j]),j] = 'NA' > >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > >> >> > >> >> > >> >> ## My first session info > >> >> # > >> >> #> sessionInfo() > >> >> #R version 2.12.0 (2010-10-15) > >> >> #Platform: i386-pc-mingw32/i386 (32-bit) > >> >> # > >> >> #locale: > >> >> # [1] LC_COLLATE=English_United States.1252 > >> >> #[2] LC_CTYPE=English_United States.1252 > >> >> #[3] LC_MONETARY=English_United States.1252 > >> >> #[4] LC_NUMERIC=C > >> >> #[5] LC_TIME=English_United States.1252 > >> >> # > >> >> #attached base packages: > >> >> # [1] stats graphics grDevices utils datasets > methods > >> >> base > >> >> # > >> >> #other attached packages: > >> >> # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 > >> >> # > >> >> #loaded via a namespace (and not attached): > >> >> # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 > >> >> rJava_0.8-8 > >> >> #[5] tools_2.12.0 > >> >> > >> >> > >> >> > >> >> I tried running that example in a different version of R, with and I > >> >> got > >> >> completely different results > >> >> > >> >> The other version of R wouldn't recognize the formula at all.. > >> >> > >> >> My other version of R: > >> >> > >> >> # My second session info > >> >> #> sessionInfo() > >> >> #R version 2.10.1 (2009-12-14) > >> >> #i386-pc-mingw32 > >> >> # > >> >> #locale: > >> >> # [1] LC_COLLATE=English_United States.1252 > >> >> #[2] LC_CTYPE=English_United States.1252 > >> >> #[3] LC_MONETARY=English_United States.1252 > >> >> #[4] LC_NUMERIC=C > >> >> #[5] LC_TIME=English_United States.1252 > >> >> # > >> >> #attached base packages: > >> >> # [1] stats graphics grDevices utils datasets > methods > >> >> base > >> >> #> > >> >> # > >> >> > >> >> PS: Also, I have read the help on aggregate, factor, as.factor, and > >> >> several > >> >> other topics. If I missed something, please let me know. > >> >> Some people like to reply to questions by telling the sender that R > has > >> >> documentation. Please don't. The R help archives are littered with > >> >> reminders, friendly and otherwise, of R's documentation. > >> >> > >> >> [[alternative HTML version deleted]] > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> > > >> > > >> > > >> > -- > >> > Ista Zahn > >> > Graduate student > >> > University of Rochester > >> > Department of Clinical and Social Psychology > >> > http://yourpsyche.org > >> > > >> > >> > >> > >> -- > >> Ista Zahn > >> Graduate student > >> University of Rochester > >> Department of Clinical and Social Psychology > >> http://yourpsyche.org > > > > > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.