Wouldn't > interaction(..., drop=TRUE)
be the same, but terser in this situation? Also I tend to use paste() for this, i.e. instead of > interaction(v1,v2, drop=TRUE) simply > paste(v1,v2) Again, this seems shorter and simpler -- but are there good reasons to prefer the use of interaction()? Cheers, Bert On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > You're right, thanks. > In my solution, I had tried to keep to the op as much as possible. A glance > at it made me realize that one change only would do the job, and that was > it, no performance worries. > I particularly liked the interaction/droplevels trick. > > Rui Barradas > > Em 25-07-2012 22:13, William Dunlap escreveu: >> >> Rui, >> Your solution works, but it can be faster for large data.frames if you >> compute >> the indices of the desired rows of the input data.frame and then using one >> subscripting call to select the rows instead of splitting the input >> data.frame >> into a list of data.frames, extracting the desired row from each >> component, >> and then calling rbind to put the rows together again. E.g., compare your >> approach, which I've put into the function f1 >> f1 <- function (dataFrame) { >> retval <- with(dataFrame, sapply(split(dataFrame, list(PTID, >> Year)), function(x) if (nrow(x)) >> x[which.max(x$Count), ])) >> retval <- do.call(rbind, retval) >> rownames(retval) <- 1:nrow(retval) >> retval >> } >> with one that computes a logical subscripting vector (by splitting just >> the >> Counts vector, not the whole data.frame) >> f2 <- function (dataFrame) { >> keep <- as.logical(ave(dataFrame$Count, >> droplevels(interaction(dataFrame$PTID, >> dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x) >> == >> which.max(x))) >> dataFrame[keep, ] >> } >> >> The both compute the same thing, aside from the fact that the rows >> are in a different order (f2 keeps the order of the original data.frame) >> and f2 leaves the original row label with the row. >>> >>> f1(df1) >> >> PGID PTID Year Visit Count >> 1 6755 53122 2008 3 1 >> 2 6755 53121 2009 1 0 >> 3 6755 53122 2009 3 2 >>> >>> f2(df1) >> >> PGID PTID Year Visit Count >> 1 6755 53121 2009 1 0 >> 6 6755 53122 2008 3 1 >> 9 6755 53122 2009 3 2 >> When there are a lot of output rows the f2 can be quite a bit faster. >> >> (I put the call to droplevels(interaction(...)) into the call to ave >> because ave >> can waste a lot of time calling FUN for nonexistent interaction levels.) >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >> >>> -----Original Message----- >>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>> On >>> Behalf Of Rui Barradas >>> Sent: Wednesday, July 25, 2012 10:24 AM >>> To: kborgmann >>> Cc: r-help >>> Subject: Re: [R] Select rows based on matching conditions and logical >>> operators >>> >>> Hello, >>> >>> Apart from the output order this does it. >>> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution >>> density.) >>> >>> >>> df1 <- read.table(text=" >>> PGID PTID Year Visit Count >>> 6755 53121 2009 1 0 >>> 6755 53121 2009 2 0 >>> 6755 53121 2009 3 0 >>> 6755 53122 2008 1 0 >>> 6755 53122 2008 2 0 >>> 6755 53122 2008 3 1 >>> 6755 53122 2009 1 0 >>> 6755 53122 2009 2 1 >>> 6755 53122 2009 3 2", header=TRUE) >>> >>> >>> df2 <- with(df1, sapply(split(df1, list(PTID, Year)), >>> function(x) if (nrow(x)) x[which.max(x$Count), ])) >>> df2 <- do.call(rbind, df2) >>> rownames(df2) <- 1:nrow(df2) >>> df2 >>> >>> which.max(9, not which(). >>> >>> Hope this helps, >>> >>> Rui Barradas >>> Em 25-07-2012 18:10, kborgmann escreveu: >>>> >>>> Hi, >>>> I have a dataset in which I would like to select rows based on matching >>>> conditions and return the maximum value of a variable else return one >>>> row if >>>> duplicate counts exist. My dataset looks like this: >>>> PGID PTID Year Visit Count >>>> 6755 53121 2009 1 0 >>>> 6755 53121 2009 2 0 >>>> 6755 53121 2009 3 0 >>>> 6755 53122 2008 1 0 >>>> 6755 53122 2008 2 0 >>>> 6755 53122 2008 3 1 >>>> 6755 53122 2009 1 0 >>>> 6755 53122 2009 2 1 >>>> 6755 53122 2009 3 2 >>>> >>>> I would like to select rows if PTID and Year match and return the >>>> maximum >>>> count else return one row if counts are the same, such that I get this >>>> output >>>> PGID PTID Year Visit Count >>>> 6755 53121 2009 1 0 >>>> 6755 53122 2008 3 1 >>>> 6755 53122 2009 3 2 >>>> >>>> I tried the following code and the output is almost correct but >>>> duplicate >>>> values were included >>>> df2<-with(df, sapply(split(df, list(PTID, Year)), >>>> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),])) >>>> df<-do.call(rbind,df) >>>> rownames(df)<-1:nrow(df) >>>> >>>> Any suggestions? >>>> Thanks much for your responses! >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://r.789695.n4.nabble.com/Select-rows-based- >>> >>> on-matching-conditions-and-logical-operators-tp4637809.html >>>> >>>> Sent from the R help mailing list archive at Nabble.com. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.