Any of those would work. I wish ave() did that part of the job. I don't think there is any reason it shouldn't. The following only needs to call FUN three times, not 9: > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x)) [1] "A" character(0) character(0) character(0) [1] "B" character(0) character(0) character(0) [1] "C" > z [1] "A" "B" "C"
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Wednesday, July 25, 2012 3:04 PM > To: Rui Barradas > Cc: William Dunlap; r-help > Subject: Re: [R] Select rows based on matching conditions and logical > operators > > Wouldn't > > > interaction(..., drop=TRUE) > > be the same, but terser in this situation? > > Also I tend to use paste() for this, i.e. instead of > > > interaction(v1,v2, drop=TRUE) > > simply > > > paste(v1,v2) > > Again, this seems shorter and simpler -- but are there good reasons to > prefer the use of interaction()? > > Cheers, > Bert > > On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > > Hello, > > > > You're right, thanks. > > In my solution, I had tried to keep to the op as much as possible. A glance > > at it made me realize that one change only would do the job, and that was > > it, no performance worries. > > I particularly liked the interaction/droplevels trick. > > > > Rui Barradas > > > > Em 25-07-2012 22:13, William Dunlap escreveu: > >> > >> Rui, > >> Your solution works, but it can be faster for large data.frames if you > >> compute > >> the indices of the desired rows of the input data.frame and then using one > >> subscripting call to select the rows instead of splitting the input > >> data.frame > >> into a list of data.frames, extracting the desired row from each > >> component, > >> and then calling rbind to put the rows together again. E.g., compare your > >> approach, which I've put into the function f1 > >> f1 <- function (dataFrame) { > >> retval <- with(dataFrame, sapply(split(dataFrame, list(PTID, > >> Year)), function(x) if (nrow(x)) > >> x[which.max(x$Count), ])) > >> retval <- do.call(rbind, retval) > >> rownames(retval) <- 1:nrow(retval) > >> retval > >> } > >> with one that computes a logical subscripting vector (by splitting just > >> the > >> Counts vector, not the whole data.frame) > >> f2 <- function (dataFrame) { > >> keep <- as.logical(ave(dataFrame$Count, > >> droplevels(interaction(dataFrame$PTID, > >> dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x) > >> == > >> which.max(x))) > >> dataFrame[keep, ] > >> } > >> > >> The both compute the same thing, aside from the fact that the rows > >> are in a different order (f2 keeps the order of the original data.frame) > >> and f2 leaves the original row label with the row. > >>> > >>> f1(df1) > >> > >> PGID PTID Year Visit Count > >> 1 6755 53122 2008 3 1 > >> 2 6755 53121 2009 1 0 > >> 3 6755 53122 2009 3 2 > >>> > >>> f2(df1) > >> > >> PGID PTID Year Visit Count > >> 1 6755 53121 2009 1 0 > >> 6 6755 53122 2008 3 1 > >> 9 6755 53122 2009 3 2 > >> When there are a lot of output rows the f2 can be quite a bit faster. > >> > >> (I put the call to droplevels(interaction(...)) into the call to ave > >> because ave > >> can waste a lot of time calling FUN for nonexistent interaction levels.) > >> > >> Bill Dunlap > >> Spotfire, TIBCO Software > >> wdunlap tibco.com > >> > >> > >>> -----Original Message----- > >>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > >>> On > >>> Behalf Of Rui Barradas > >>> Sent: Wednesday, July 25, 2012 10:24 AM > >>> To: kborgmann > >>> Cc: r-help > >>> Subject: Re: [R] Select rows based on matching conditions and logical > >>> operators > >>> > >>> Hello, > >>> > >>> Apart from the output order this does it. > >>> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution > >>> density.) > >>> > >>> > >>> df1 <- read.table(text=" > >>> PGID PTID Year Visit Count > >>> 6755 53121 2009 1 0 > >>> 6755 53121 2009 2 0 > >>> 6755 53121 2009 3 0 > >>> 6755 53122 2008 1 0 > >>> 6755 53122 2008 2 0 > >>> 6755 53122 2008 3 1 > >>> 6755 53122 2009 1 0 > >>> 6755 53122 2009 2 1 > >>> 6755 53122 2009 3 2", header=TRUE) > >>> > >>> > >>> df2 <- with(df1, sapply(split(df1, list(PTID, Year)), > >>> function(x) if (nrow(x)) x[which.max(x$Count), ])) > >>> df2 <- do.call(rbind, df2) > >>> rownames(df2) <- 1:nrow(df2) > >>> df2 > >>> > >>> which.max(9, not which(). > >>> > >>> Hope this helps, > >>> > >>> Rui Barradas > >>> Em 25-07-2012 18:10, kborgmann escreveu: > >>>> > >>>> Hi, > >>>> I have a dataset in which I would like to select rows based on matching > >>>> conditions and return the maximum value of a variable else return one > >>>> row if > >>>> duplicate counts exist. My dataset looks like this: > >>>> PGID PTID Year Visit Count > >>>> 6755 53121 2009 1 0 > >>>> 6755 53121 2009 2 0 > >>>> 6755 53121 2009 3 0 > >>>> 6755 53122 2008 1 0 > >>>> 6755 53122 2008 2 0 > >>>> 6755 53122 2008 3 1 > >>>> 6755 53122 2009 1 0 > >>>> 6755 53122 2009 2 1 > >>>> 6755 53122 2009 3 2 > >>>> > >>>> I would like to select rows if PTID and Year match and return the > >>>> maximum > >>>> count else return one row if counts are the same, such that I get this > >>>> output > >>>> PGID PTID Year Visit Count > >>>> 6755 53121 2009 1 0 > >>>> 6755 53122 2008 3 1 > >>>> 6755 53122 2009 3 2 > >>>> > >>>> I tried the following code and the output is almost correct but > >>>> duplicate > >>>> values were included > >>>> df2<-with(df, sapply(split(df, list(PTID, Year)), > >>>> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) > >>>> df<-do.call(rbind,df) > >>>> rownames(df)<-1:nrow(df) > >>>> > >>>> Any suggestions? > >>>> Thanks much for your responses! > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> View this message in context: > >>>> http://r.789695.n4.nabble.com/Select-rows-based- > >>> > >>> on-matching-conditions-and-logical-operators-tp4637809.html > >>>> > >>>> Sent from the R help mailing list archive at Nabble.com. > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- > biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.