And another way to drop the unneed interaction levels is to supply drop=TRUE to ave(): > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x), drop=TRUE) [1] "A" [1] "B" [1] "C"
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of William Dunlap > Sent: Wednesday, July 25, 2012 3:37 PM > To: Bert Gunter; Rui Barradas > Cc: r-help > Subject: Re: [R] Select rows based on matching conditions and logical > operators > > Any of those would work. I wish ave() did that part of the job. > I don't think there is any reason it shouldn't. The following only > needs to call FUN three times, not 9: > > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x)) > [1] "A" > character(0) > character(0) > character(0) > [1] "B" > character(0) > character(0) > character(0) > [1] "C" > > z > [1] "A" "B" "C" > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -----Original Message----- > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > > Sent: Wednesday, July 25, 2012 3:04 PM > > To: Rui Barradas > > Cc: William Dunlap; r-help > > Subject: Re: [R] Select rows based on matching conditions and logical > > operators > > > > Wouldn't > > > > > interaction(..., drop=TRUE) > > > > be the same, but terser in this situation? > > > > Also I tend to use paste() for this, i.e. instead of > > > > > interaction(v1,v2, drop=TRUE) > > > > simply > > > > > paste(v1,v2) > > > > Again, this seems shorter and simpler -- but are there good reasons to > > prefer the use of interaction()? > > > > Cheers, > > Bert > > > > On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > > > Hello, > > > > > > You're right, thanks. > > > In my solution, I had tried to keep to the op as much as possible. A > > > glance > > > at it made me realize that one change only would do the job, and that was > > > it, no performance worries. > > > I particularly liked the interaction/droplevels trick. > > > > > > Rui Barradas > > > > > > Em 25-07-2012 22:13, William Dunlap escreveu: > > >> > > >> Rui, > > >> Your solution works, but it can be faster for large data.frames if you > > >> compute > > >> the indices of the desired rows of the input data.frame and then using > > >> one > > >> subscripting call to select the rows instead of splitting the input > > >> data.frame > > >> into a list of data.frames, extracting the desired row from each > > >> component, > > >> and then calling rbind to put the rows together again. E.g., compare > > >> your > > >> approach, which I've put into the function f1 > > >> f1 <- function (dataFrame) { > > >> retval <- with(dataFrame, sapply(split(dataFrame, list(PTID, > > >> Year)), function(x) if (nrow(x)) > > >> x[which.max(x$Count), ])) > > >> retval <- do.call(rbind, retval) > > >> rownames(retval) <- 1:nrow(retval) > > >> retval > > >> } > > >> with one that computes a logical subscripting vector (by splitting just > > >> the > > >> Counts vector, not the whole data.frame) > > >> f2 <- function (dataFrame) { > > >> keep <- as.logical(ave(dataFrame$Count, > > >> droplevels(interaction(dataFrame$PTID, > > >> dataFrame$Year)), FUN = function(x) if (length(x)) > > >> seq_along(x) > > >> == > > >> which.max(x))) > > >> dataFrame[keep, ] > > >> } > > >> > > >> The both compute the same thing, aside from the fact that the rows > > >> are in a different order (f2 keeps the order of the original data.frame) > > >> and f2 leaves the original row label with the row. > > >>> > > >>> f1(df1) > > >> > > >> PGID PTID Year Visit Count > > >> 1 6755 53122 2008 3 1 > > >> 2 6755 53121 2009 1 0 > > >> 3 6755 53122 2009 3 2 > > >>> > > >>> f2(df1) > > >> > > >> PGID PTID Year Visit Count > > >> 1 6755 53121 2009 1 0 > > >> 6 6755 53122 2008 3 1 > > >> 9 6755 53122 2009 3 2 > > >> When there are a lot of output rows the f2 can be quite a bit faster. > > >> > > >> (I put the call to droplevels(interaction(...)) into the call to ave > > >> because ave > > >> can waste a lot of time calling FUN for nonexistent interaction levels.) > > >> > > >> Bill Dunlap > > >> Spotfire, TIBCO Software > > >> wdunlap tibco.com > > >> > > >> > > >>> -----Original Message----- > > >>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > > >>> On > > >>> Behalf Of Rui Barradas > > >>> Sent: Wednesday, July 25, 2012 10:24 AM > > >>> To: kborgmann > > >>> Cc: r-help > > >>> Subject: Re: [R] Select rows based on matching conditions and logical > > >>> operators > > >>> > > >>> Hello, > > >>> > > >>> Apart from the output order this does it. > > >>> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution > > >>> density.) > > >>> > > >>> > > >>> df1 <- read.table(text=" > > >>> PGID PTID Year Visit Count > > >>> 6755 53121 2009 1 0 > > >>> 6755 53121 2009 2 0 > > >>> 6755 53121 2009 3 0 > > >>> 6755 53122 2008 1 0 > > >>> 6755 53122 2008 2 0 > > >>> 6755 53122 2008 3 1 > > >>> 6755 53122 2009 1 0 > > >>> 6755 53122 2009 2 1 > > >>> 6755 53122 2009 3 2", header=TRUE) > > >>> > > >>> > > >>> df2 <- with(df1, sapply(split(df1, list(PTID, Year)), > > >>> function(x) if (nrow(x)) x[which.max(x$Count), ])) > > >>> df2 <- do.call(rbind, df2) > > >>> rownames(df2) <- 1:nrow(df2) > > >>> df2 > > >>> > > >>> which.max(9, not which(). > > >>> > > >>> Hope this helps, > > >>> > > >>> Rui Barradas > > >>> Em 25-07-2012 18:10, kborgmann escreveu: > > >>>> > > >>>> Hi, > > >>>> I have a dataset in which I would like to select rows based on matching > > >>>> conditions and return the maximum value of a variable else return one > > >>>> row if > > >>>> duplicate counts exist. My dataset looks like this: > > >>>> PGID PTID Year Visit Count > > >>>> 6755 53121 2009 1 0 > > >>>> 6755 53121 2009 2 0 > > >>>> 6755 53121 2009 3 0 > > >>>> 6755 53122 2008 1 0 > > >>>> 6755 53122 2008 2 0 > > >>>> 6755 53122 2008 3 1 > > >>>> 6755 53122 2009 1 0 > > >>>> 6755 53122 2009 2 1 > > >>>> 6755 53122 2009 3 2 > > >>>> > > >>>> I would like to select rows if PTID and Year match and return the > > >>>> maximum > > >>>> count else return one row if counts are the same, such that I get this > > >>>> output > > >>>> PGID PTID Year Visit Count > > >>>> 6755 53121 2009 1 0 > > >>>> 6755 53122 2008 3 1 > > >>>> 6755 53122 2009 3 2 > > >>>> > > >>>> I tried the following code and the output is almost correct but > > >>>> duplicate > > >>>> values were included > > >>>> df2<-with(df, sapply(split(df, list(PTID, Year)), > > >>>> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) > > >>>> df<-do.call(rbind,df) > > >>>> rownames(df)<-1:nrow(df) > > >>>> > > >>>> Any suggestions? > > >>>> Thanks much for your responses! > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> View this message in context: > > >>>> http://r.789695.n4.nabble.com/Select-rows-based- > > >>> > > >>> on-matching-conditions-and-logical-operators-tp4637809.html > > >>>> > > >>>> Sent from the R help mailing list archive at Nabble.com. > > >>>> > > >>>> ______________________________________________ > > >>>> R-help@r-project.org mailing list > > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>>> PLEASE do read the posting guide > > >>>> http://www.R-project.org/posting-guide.html > > >>>> and provide commented, minimal, self-contained, reproducible code. > > >>> > > >>> ______________________________________________ > > >>> R-help@r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide > > >>> http://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > > > Bert Gunter > > Genentech Nonclinical Biostatistics > > > > Internal Contact Info: > > Phone: 467-7374 > > Website: > > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- > > biostatistics/pdb-ncb-home.htm > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.