Well, I obviously don't use it either, as I'm just quoting the docs. I either use by(), or tapply().
-- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Thu, Mar 5, 2015 at 10:47 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Bert: using the sample data frame from below, try to interpret the output of > this: > > aggregate( dat[,1:2], dat[,"g",drop=FALSE, FUN=function(x){print(x);class(x)}) > > The help text you quote is probably not as clear as it should be. Would the > following be better? > > "... and FUN is applied to each column in each such subset with further > arguments in ... passed to it." > > I became aware of this "feature" because this application of exactly the same > aggregation function to all of my data columns is not convenient for my > day-to-day work. Thus, I don't use "aggregate" very often. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On March 5, 2015 8:59:55 AM PST, Bert Gunter <gunter.ber...@gene.com> wrote: >>That's not what ?aggregate says: >> >>"aggregate.data.frame is the data frame method. If x is not a data >>frame, it is coerced to one, which must have a non-zero number of >>rows. Then, each of the variables (columns) in x is split into subsets >>of cases (rows) of identical combinations of the components of by, and >>FUN is applied to each such subset with further arguments in ... >>passed to it." >> >> >>As I read this, the argument of FUN is a data frame that is a subset >>of the original frame, defined by the by variable values. >> >> >>No? >> >> >>-- Bert >> >>Bert Gunter >>Genentech Nonclinical Biostatistics >>(650) 467-7374 >> >>"Data is not information. Information is not knowledge. And knowledge >>is certainly not wisdom." >>Clifford Stoll >> >> >> >> >>On Thu, Mar 5, 2015 at 8:55 AM, Jeff Newmiller >><jdnew...@dcn.davis.ca.us> wrote: >>> I don't see your point. No matter which version of aggregate you use, >>FUN is applied to vectors. Those vectors may be columns in a data frame >>or not, but FUN is always given one vector at a time by aggregate. >>> >>--------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >>Live... >>> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>Go... >>> Live: OO#.. Dead: OO#.. >>Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>rocks...1k >>> >>--------------------------------------------------------------------------- >>> Sent from my phone. Please excuse my brevity. >>> >>> On March 5, 2015 8:12:39 AM PST, Bert Gunter <gunter.ber...@gene.com> >>wrote: >>>>Sorry, Jeff. aggregate() is generic. >>>> >>>>>From ?aggregate: >>>> >>>>"## S3 method for class 'data.frame' >>>>aggregate(x, by, FUN, ..., simplify = TRUE)" >>>> >>>>Cheers, >>>>Bert >>>> >>>>Bert Gunter >>>>Genentech Nonclinical Biostatistics >>>>(650) 467-7374 >>>> >>>>"Data is not information. Information is not knowledge. And knowledge >>>>is certainly not wisdom." >>>>Clifford Stoll >>>> >>>> >>>> >>>> >>>>On Thu, Mar 5, 2015 at 7:54 AM, Jeff Newmiller >>>><jdnew...@dcn.davis.ca.us> wrote: >>>>> The aggregate function applies FUN to vectors, not data frames. For >>>>example, the default "mean" function accepts a vector such as a >>column >>>>in a data frame and returns a scalar (well, a vector of length 1). >>>>Aggregate then calls this function once for each piece of the >>column(s) >>>>you give it. Your function wants two vectors, but aggregate does not >>>>understand how to give two inputs. >>>>> >>>>> (In the future, please follow R-help mailing list guidelines and >>post >>>>using plain text so your code does not get messed up.) >>>>> >>>>> You could use split to break your data frame into a list of data >>>>frames, and then sapply to extract the results you are looking for. I >>>>prefer to use the plyr or dplyr or data.table packages to do all this >>>>for me. >>>>> >>>>> d_rule <- function( DF ) { >>>>> i <- which( DF$a==max( DF$a ) ) >>>>> if ( length( i ) == 1 ){ >>>>> DF[ i, "x" ] >>>>> } else { >>>>> min( DF[ , "x" ] ) # did you mean min( DF$x[i] ) ? >>>>> } >>>>> } >>>>> >>>>> dat <- data.frame( a=c(2,2,1,4,2,5,2,3,4,4) >>>>> , x = c(1:10) >>>>> , g = c(1,1,2,2,3,3,4,4,5,5) >>>>> ) >>>>> # note that cbind on vectors creates a matrix >>>>> # in a matrix all columns must be of the same type >>>>> # but data frames generally have a variety of types >>>>> # so don't use cbind when making a data frame >>>>> >>>>> library( dplyr ) >>>>> >>>>> result <- dat %>% group_by( g ) %>% do( answer = d_rule( . ) ) %>% >>>>as.data.frame >>>>> >>>>> >>>>--------------------------------------------------------------------------- >>>>> Jeff Newmiller The ..... ..... Go >>>>Live... >>>>> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. >>Live >>>>Go... >>>>> Live: OO#.. Dead: OO#.. >>>>Playing >>>>> Research Engineer (Solar/Batteries O.O#. #.O#. >>with >>>>> /Software/Embedded Controllers) .OO#. .OO#. >>>>rocks...1k >>>>> >>>>--------------------------------------------------------------------------- >>>>> Sent from my phone. Please excuse my brevity. >>>>> >>>>> On March 4, 2015 2:02:06 PM PST, Typhenn Brichieri-Colombi via >>R-help >>>><r-help@r-project.org> wrote: >>>>>>Hello, >>>>>> >>>>>>I am trying to use the following custom function in an >>>>>>aggregatefunction, but cannot get R to recognize my data. I’ve read >>>>the >>>>>>help on function()and on aggregate() but am unable to solve my >>>>problem. >>>>>>How can I get R torecognize the data inputs for the custom function >>>>>>nested within aggregate()? >>>>>> >>>>>>My custom function is found below, as well as the errormessage I >>get >>>>>>when I run it on a test data set (I will be using this functionon a >>>>>>much larger dataset (over 600,000 rows)) >>>>>> >>>>>>Thank you for your time and your help! >>>>>> >>>>>> >>>>>> >>>>>>d_rule<-function(a,x){ >>>>>> >>>>>>i<-which(a==max(a)) >>>>>> >>>>>>out<-ifelse(length(i)==1, x[i], min(x)) >>>>>> >>>>>>return(out) >>>>>> >>>>>>} >>>>>> >>>>>> >>>>>> >>>>>>a<-c(2,2,1,4,2,5,2,3,4,4) >>>>>> >>>>>>x<-c(1:10) >>>>>> >>>>>>g<-c(1,1,2,2,3,3,4,4,5,5) >>>>>> >>>>>>dat<-as.data.frame(cbind(x,g)) >>>>>> >>>>>> >>>>>> >>>>>>test<-aggregate(dat, by=list(g), FUN=d_rule,dat$a, dat$x) >>>>>> >>>>>>Error in dat$x : $ operator is invalid for atomic vectors >>>>>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>>______________________________________________ >>>>>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>PLEASE do read the posting guide >>>>>>http://www.R-project.org/posting-guide.html >>>>>>and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.