oops. For clarity, that should have been sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, na.rm=TRUE))})$y.sum)
-Ista On Fri, Feb 4, 2011 at 7:52 PM, Ista Zahn <iz...@psych.rochester.edu> wrote: > Hi again, > > On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes <gleyne...@gmail.com> wrote: >> Ista, >> >> Thank you again. >> >> I had figured that out... and was crafting another message when you replied. >> >> The NAs do come though on the variable that is being aggregated, >> However, they do not come through on the categorical variable(s). >> >> The aggregate function must be converting the data frame variables to >> factors, with the default "omit=NA" parameter. >> >> The help on "aggregate" says: >> na.action A function which indicates what should happen when the data >> contain NA values. >> The default is to ignore missing values in the given >> variables. >> By "data" it must only refer to the aggregated variable, and not the >> categorical variables. I thought it referred to both, because I thought it >> referred to the "data" argument, which is the underlying data frame. >> >> I think the proper way to accomplish this would be to recast my x >> (categorical) variables as factors. > > Yes, that would work. > > This is not feasible for me due to >> other complications. >> Also, (imho) the help should be more clear about what the na.action >> modifies. >> >> So, unless someone has a better idea, I guess I'm out of luck? > > Well, you can use ddply from the plyr package: > > library(plyr) # may need to install first. > sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, > na.rm=TRUE))})$y) > > However, I don't think you've told us what you're actually trying to > accomplish... > > Best, > Ista > >> >> >> On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn <iz...@psych.rochester.edu> wrote: >>> >>> Hi, >>> >>> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes <gleyne...@gmail.com> wrote: >>> > Thank you both for the thoughtful (and funny) replies. >>> > >>> > I agree with both of you that sum is the one picking up aggregate. >>> > Although >>> > I didn't mention it, I did realize that in the first place. >>> > Also, thank you Phil for pointing out that aggregate only accepts a >>> > formula >>> > value in more recent versions! I actually thought that was an older >>> > feature, but I must be thinking of other functions. >>> > >>> > I still don't see why these two values are not the same! >>> > >>> > It seems like a bug to me >>> >>> No, not a bug (see below). >>> >>> > >>> >> set.seed(100) >>> >> dat=data.frame( >>> > + x1=sample(c(NA,'m','f'), 100, replace=TRUE), >>> > + x2=sample(c(NA, 1:10), 100, replace=TRUE), >>> > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), >>> > + x4=sample(c(NA,T,F), 100, replace=TRUE), >>> > + y=sample(c(rep(NA,5), rnorm(95)))) >>> >> sum(dat$y, na.rm=T) >>> > [1] 0.0815244116598 >>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, >>> >> na.rm=T)$y) >>> > [1] -4.45087666247 >>> >> >>> >>> Because in the first one you are only removing missing data in dat$y. >>> In the second one you are removeing all rows that contain missing data >>> in any of the columns. >>> >>> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat, >>> sum, na.action=na.pass, na.rm=T)$y)) >>> [1] TRUE >>> >>> Best, >>> Ista >>> >>> > >>> > >>> > >>> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <iz...@psych.rochester.edu> >>> > wrote: >>> >> >>> >> Sorry, I didn't see Phil's reply, which is better than mine anyway. >>> >> >>> >> -Ista >>> >> >>> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <iz...@psych.rochester.edu> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > Please see ?na.action >>> >> > >>> >> > (just kidding!) >>> >> > >>> >> > So it seems to me the problem is that you are passing na.rm to the >>> >> > sum >>> >> > function. So there is no missing data for the na.action argument to >>> >> > operate on! >>> >> > >>> >> > Compare >>> >> > >>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) >>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) >>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) >>> >> > >>> >> > >>> >> > Best, >>> >> > Ista >>> >> > >>> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleyne...@gmail.com> >>> >> > wrote: >>> >> >> Can someone please tell me what is up with na.action in aggregate? >>> >> >> >>> >> >> My (somewhat) reproducible example: >>> >> >> (I say somewhat because some lines wouldn't run in a separate >>> >> >> session, >>> >> >> more >>> >> >> below) >>> >> >> >>> >> >> set.seed(100) >>> >> >> dat=data.frame( >>> >> >> x1=sample(c(NA,'m','f'), 100, replace=TRUE), >>> >> >> x2=sample(c(NA, 1:10), 100, replace=TRUE), >>> >> >> x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), >>> >> >> x4=sample(c(NA,T,F), 100, replace=TRUE), >>> >> >> y=sample(c(rep(NA,5), rnorm(95)))) >>> >> >> dat >>> >> >> ## The total from dat: >>> >> >> sum(dat$y, na.rm=T) >>> >> >> ## The total from aggregate: >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- >>> >> >> This >>> >> >> line >>> >> >> gave an error in a separate R instance >>> >> >> ## The aggregate formula is excluding NA >>> >> >> >>> >> >> ## So, let's try to include NAs >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >>> >> >> na.action='na.pass')$y) >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >>> >> >> na.action=na.pass)$y) >>> >> >> ## The aggregate formula is STILL excluding NA >>> >> >> ## In fact, the formula doesn't seem to notice the na.action >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo >>> >> >> man >>> >> >> chew')$y) >>> >> >> ## Hmmmm... that error surprised me (since the previous two things >>> >> >> ran) >>> >> >> >>> >> >> ## So, let's try to change the global options >>> >> >> ## (not mentioned in the help, but after reading the help >>> >> >> ## 100 times, I thought I would go above and beyond to avoid >>> >> >> ## any r list flames from people complaining >>> >> >> ## that I didn't read the help... but that's a separate topic) >>> >> >> options(na.action ="na.pass") >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >>> >> >> na.action='na.pass')$y) >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >>> >> >> na.action=na.pass)$y) >>> >> >> ## (NAs are still omitted) >>> >> >> >>> >> >> ## Even more frustrating... >>> >> >> ## Why don't any of these work??? >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, >>> >> >> na.action='na.pass')$x) >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, >>> >> >> na.action='na.omit')$x) >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) >>> >> >> >>> >> >> >>> >> >> ## This does work, but in my real data set, I want NA to really be >>> >> >> NA >>> >> >> for(j in 1:4) >>> >> >> dat[is.na(dat[,j]),j] = 'NA' >>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >>> >> >> >>> >> >> >>> >> >> ## My first session info >>> >> >> # >>> >> >> #> sessionInfo() >>> >> >> #R version 2.12.0 (2010-10-15) >>> >> >> #Platform: i386-pc-mingw32/i386 (32-bit) >>> >> >> # >>> >> >> #locale: >>> >> >> # [1] LC_COLLATE=English_United States.1252 >>> >> >> #[2] LC_CTYPE=English_United States.1252 >>> >> >> #[3] LC_MONETARY=English_United States.1252 >>> >> >> #[4] LC_NUMERIC=C >>> >> >> #[5] LC_TIME=English_United States.1252 >>> >> >> # >>> >> >> #attached base packages: >>> >> >> # [1] stats graphics grDevices utils datasets >>> >> >> methods >>> >> >> base >>> >> >> # >>> >> >> #other attached packages: >>> >> >> # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 >>> >> >> # >>> >> >> #loaded via a namespace (and not attached): >>> >> >> # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 >>> >> >> rJava_0.8-8 >>> >> >> #[5] tools_2.12.0 >>> >> >> >>> >> >> >>> >> >> >>> >> >> I tried running that example in a different version of R, with and I >>> >> >> got >>> >> >> completely different results >>> >> >> >>> >> >> The other version of R wouldn't recognize the formula at all.. >>> >> >> >>> >> >> My other version of R: >>> >> >> >>> >> >> # My second session info >>> >> >> #> sessionInfo() >>> >> >> #R version 2.10.1 (2009-12-14) >>> >> >> #i386-pc-mingw32 >>> >> >> # >>> >> >> #locale: >>> >> >> # [1] LC_COLLATE=English_United States.1252 >>> >> >> #[2] LC_CTYPE=English_United States.1252 >>> >> >> #[3] LC_MONETARY=English_United States.1252 >>> >> >> #[4] LC_NUMERIC=C >>> >> >> #[5] LC_TIME=English_United States.1252 >>> >> >> # >>> >> >> #attached base packages: >>> >> >> # [1] stats graphics grDevices utils datasets >>> >> >> methods >>> >> >> base >>> >> >> #> >>> >> >> # >>> >> >> >>> >> >> PS: Also, I have read the help on aggregate, factor, as.factor, and >>> >> >> several >>> >> >> other topics. If I missed something, please let me know. >>> >> >> Some people like to reply to questions by telling the sender that R >>> >> >> has >>> >> >> documentation. Please don't. The R help archives are littered with >>> >> >> reminders, friendly and otherwise, of R's documentation. >>> >> >> >>> >> >> [[alternative HTML version deleted]] >>> >> >> >>> >> >> ______________________________________________ >>> >> >> R-help@r-project.org mailing list >>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> >> PLEASE do read the posting guide >>> >> >> http://www.R-project.org/posting-guide.html >>> >> >> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Ista Zahn >>> >> > Graduate student >>> >> > University of Rochester >>> >> > Department of Clinical and Social Psychology >>> >> > http://yourpsyche.org >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Ista Zahn >>> >> Graduate student >>> >> University of Rochester >>> >> Department of Clinical and Social Psychology >>> >> http://yourpsyche.org >>> > >>> > >>> >>> >>> >>> -- >>> Ista Zahn >>> Graduate student >>> University of Rochester >>> Department of Clinical and Social Psychology >>> http://yourpsyche.org >> >> > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.